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The complete nucleotide sequence of bacteriophage T7 DNA, 39.936 base-pairs 
has been determined by the techniques of Maxam & Gilbert. All previously known 
JT7 genes and several unsuspected genes have been identified in the sequence T7 
U.\A carries genetic information very efficiently : the coding sequences of 50 genes 
are close-packed but essentially not overlapping, and occupv almost 92% of the 
nucleotide sequence. This arrangement strongly suggests that all 50 of these close- 
packed genes are expressed, although there is as yet evidence for expression of 
°J i .-iL them. In addition, five potential overlapping genes have been 
identihed, and there is preliminary evidence that one of them is expressed Where 
gaps between coding sequences are found, they usually are less than 100 base- 
pairs long, and usually contain one or more transcription signals, RNAase III 
cleavage sites, or origins of replication. Transcription signals in the T7 DNA 
include the three strong early promoters and the early termination site for 
JS*tenchui colt RN A polymerase, and 17 promoters and one termination site for 
T7RNA polymerase. Ten RNAase III cleavage sites have been located, five in the 
early region and five in the late region. The primary transcripts are processed at 
these sites to provide the messenger RNAs observed in vivo. Almost all of the T7 
messenger RX As are polycistronic, but there are few polar effects at the level 
or transcription or translation, and most T7 proteins seem to be initiated 
mdependently. each from ite own ribosome-binding and initiation site. The 
initiation dodon for most T7 proteins is AUG. but a few proteins are predicted to 
begin at GUG. Certain T7 genes specify pairs of overlapping proteins. The two 
proteins specified by gene 4 are made in about equal amounts, beginning at two 
different ribosome-binding and initiation sites in the same reading frame and 
ending at a common termination codon. The two proteins specified by gene JO are 
made in very different amounts. They begin at the same initiation site, but the 
minor gene 10 protein appears to be produced by a shift in translation^ reading 
frame just ahead of the normal termination codon. thereby adding 53 amino acid! 
to the UOOH-terminal end of the major protein. Gerie 10 specifies the major 
capsid protein of the phage particle, and both the major and minor gene 10 
proteins are incorporated into the phage particle. One or two other T7 genes 
appear to utilize translation^ frameshifting to produce unequal amounts of 
proteins that differ at their COOH-tenninaJ ends. The amino acid sequences and 
composi tions predicted for all of the T7 proteins (except the proteins produced by 
frameshifting) are given. T7 DNA begins and ends with a perfect direct repeat of 
160 base-pairs. Immediately adjacent to this terminal repetition, at both ends of 
the mature DNA. lie very similar, regular arrays of 12 imperfect copies of a seven- 
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base sequence. These arrays occupy about 160 base- pairs, starting about 15 base- 
pairs from the terminal repetition. In the concatemeric form of T7 DXA. a single 
copy of the terminal repetition is flanked hy these two arrays of repeated 
sequences, and* it seems likely that this arrangement is involved somehow in 
formation of the ends of mature T7 DXA. 



1. Introduction 

Bacteriophage T7 and its DXA have been the objects of considerable genetic and 
biochemical investigation, and the nucleotide sequence of the first 30% of T7 
DNA has been reported (Dunn & Studier. 1981 : Stahl & Zinn. 1981). We have 
now completed the determination of the sequence of the entire 39.936 base-pairs 
of T7 DXA, and have identified the positions of individual genes and genetic 
signals in the sequence. 

2. Determination of the Nucleotide Sequence 

The nucleotide sequence was determined on DXA from purified phage particles of wild- 
type T7 (Studier. 1969.1979) by the methods of Maxam & Gilbert (1977.1979). using 
previously determined restriction maps of T7 DXA as the starting point for isolating 
specific DXA fragments (McDonell et al.. 1977: Rosenberg rtaL. 1979: unpublished 
results). Computer programs for storing, searching and analyzing the nucleotide sequence 
were developed and applied by K. Thompson and W. Crockett. 

It is often convenient to give position in the T7 DXA molecule in units of 1% the total 
length of T7 DNA. beginning at the genetic left end. Previously, we used a value of 400 
bases or base-pairs for a T7 unit, because we estimated that T7 DXA would be close to 
40.000 base- pairs long. The nucleotide sequence reported here gives a value of 399*36 for a 
T7 unit, which is the value used throughout this paper. 

The sequence given in Fig. I is that of the /-strand of T7 DXA. the strand that has its o 
phosphate at the left end of the genetic map and has the same sequence as the T7 
messenger RNAs. In the sections that follow, the transcription and translation signals in 
T7 DXA will be discussed in detail. Tables 1 to 3 give the locations of these signals in the 
nucleotide sequence, and can be used in conjunction with Fig. 1 to locate the sequences 
referred to. 

The sequence of nucleotides 12.101 to 39.936 is newly reported in this paper. In this 
region, the sequence of 24.166 nucleotides was determined in the / strand and of 23.962 
nucleotides in the r strand, which means that 72 9° 0 of the sequence was determined in 
both strands. The sequences across all of the restriction sites used to end-label fragments 
for sequencing were overlapped from another site. 

All of the nucleotide sequence of Fig. 1 except for nucleotides 3282 to 5902 was 
determined by one of us (.I.J.D.). who proof-read the entire set of sequencing films against 
printouts of the computer file of the sequence. After all corrections had been made, the 
computer file was used to print Fig. 1. There is nothing that we currently know about the 
genetics of T7 or the physical properties of T7 DXA that is in disagreement with the 
nucleotide sequence given in Fig. I. and we expect that the error frequency in this 
sequence is very low. 

The sequence for nucleotides 3283 to 5901 is from Stahl & Zinn (1981) and Oakley & 
Coleman (1977). Grachev <fe Pletnev (1981) have also reported a sequence for nucleotides 
2858 to 5855. but their sequence differs from that of Fig. 1 in 80 places. We. use the 
sequences of Stahl & Zinn and of Oakley & Coleman in the regions we have notTdetermined 
because, in the regions that overlap, there are several differences between our sequence and 
that of Grachev & Pletnev but none between our sequence and that of Stahl & Zinn. 
Furthermore, restriction patterns predicted by the sequence of Grachev & Pletnev for some 
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e iccbcbcig irccgbccib rrgi.ccccc mnan chrrrgccc kmrricbc cirrrg.crr ccncocnc mioa iKccmn 
G..CWW.0 BcccTiooc. ncicmco ciciiwcn grgtg.cc. cig.giccc. mcioum ciciccimm gtb.ccicc. bbrgtcrcci 

CCWWCCICC mCCTtMWC CSRCBCC1BB RGtCIBCBCC 1RRRCRCCCB ICRRG1CRRC CCCTRICMR BflGtlKWC BTRBRGRCCR GRCCTRBRGR 
CCRGRCCTRB WWHW .RftRGBCCRG BGC1BRRGBC GCCMGHG. MGKBIBRR CTCmRBCC! TTBmcRUC ICHTfl,,* TflCBRCTCRC 
.R.RRCGRGB GRCRRCnRfl RRGRTTRRT) Cfi^WGRGl RHCRn.RR «,„««, BTRGGR1BC. ,RCRGCC«K 

50C ORGROGCRCR CGGCGRR.BC CCR.CCCRR. CBWCCGGG G.CRBCCGGB TRRG.RORC* KC.CR.RRC .CGCRGGR* RRCRGG.R.l GRCRRCBTGR 
• RJIRRCRU^TRRGflTK RRR.CGC.RG C.RRCRC.RO CRGCGURRC CGGGCGCRCR OGCCIICR CC.CRC7.«» GCCCOCCRCG CCBCBTRRGG 
.BSRTCRRRfi CCGTICRCRP CRIGRRGTRR RCRCOG.RCC, R1CIRCCRCR 1GRRRCGRCR dCRClCRCC RCBC.ORflRG GTGRTGCCC; C1RRCGBRRC 
CIGWCIOPC RtCr.tMIR RCRRlC.OCt RRRTRGCIC 1 iOfi&lGCRIG RCRGCGGR: RRCTCRRGGG .RICGCRRGG TGCCCITim GRIRTTCR rt 
RRTRRCW CRftufliCGC * R.CTCRflc t a: ,, OT c^nnr »c:«CCCU RCGRRRTCC* GRRBGRRRRC «, K OtimC 

•ooc mwwccc t^.s* =«c:.oc«cc «, K ,«,„ CB WKcot: or.-.x* :iccoc«c« cucgcw wmwcc trr.ggcrrg 

IT^.Z. Ci'J.'f". ':^ :K c:-»:ac:- =gg»-:o'wi' ::krio:ig crrgcgcgir icirigrgc bhrrcc^: 

..Z :w ' G;,K1s -" 'SORGGRSG' :GSOGSS;=: GRGGRGCRIG RRGRGTRRTG KIRCIRCte RCGIGCRRTS 

-~ irgcgrcrig ;;g:g:iotj x—,ck K cieacicas b>ggcbcrk icrrrgrrci gibcgrrrr; 

-rc-^oc .^. : -T^ a ::: 5:i0f s. tgrtrgrck «ccick': ctrxgrgig GCCm:,,,^ mrtcrctti rcitr.grgg grg.rrigtr 
jr-c. rt.jc^-r: jcrccgctc: bgcktbkt girggtjcr; cctt.ggg* cgc-t.bkt cirkigirg cucctRcn jaccochk 

-7 i -..-ta t . JCR-rCT.-at RGKSCRCiR CGCBRRW GSRGCRC1R- onBIGCW ICCfltRCGK CRSCGGGKB RCCG1BTG1R CBCCIGRIDG 
CMCKWC RRRTRRCGR., .CGRRCGCC" TRRGCGIGRP CKCGCRHR RCCGCRRGCi 1RBCRRGRTP GGUCCGK1 RTGRCRGRRC &CRCTGBTGG 

ggt'rtotg; -«,ggcr: g:rrr.:gg: grr.rgidrc rrctiggcgp grorrcrrcc .cgrrcgccg mckmc 

RyRGGGCGGC 5TGGC«^ CGRRRGGRRt OGG'TMRC ZVWKH aCKKTIt RRCRGGCRd RGCCRRCRCR CTGRKSCK TCCRTRRCC- 

r=.. !=c:cc . atwtsce »tCRPCRT.* zzskw 5*cxtbi: 5r:g:«::c rrgcrcgcc rrictctcrr chokbrk gicbbggirt 

..TR^ TKt,GCKG RGR1GGTCRR CRGCGRGRCG IGTCBIGGCG fiCClRRCCGB RC1RRRTCRG GCRCTIGRGC RTCRRGBT1G G1GGRC7RCC 
•TuBRGTCK 1CBCGGCTGB CGCBGGGTIC BRGRTKJCG GTRRTGCICR C7.:,CGGC< G£IIRIRG.C RCCCttlCCI RCCTBRCRW G1GRIIRRCC. 
TGGCCTURR onRRGRGCRl ICRGCCGCRG ClTBTRCCGC RIKIGCCGC fi.GTRTCRGG GTCGTCCIGG IBTCCClfiflC GTCTRCGRIG TBCBGCGCCR 
CGCIGGRTGC tmRCGCIGG IBXTTCBCGC BCHRRCGRT IGCuRGCGTT TCRRCBRTGR IGCCCRI.m BRRTRCGC1G WTIGCBBG CGBCRTCR1T 
2SO0 GRITKBR,, OWn6«P .GRTGRGTtB RCTCCmcCG BIGG1GRGT" 1GTTGBRRC1 TGIRRRCTRR TCCGCWT, CTITCRGGGC RTCGCCTCBI 
TCGRGRTGC* IRKGGGRRC RTCRTGIICT CRRRTGGRGR CG1RCCRIRC RTCRCCGRCC CGGIRTCBT7 CTCGCBGBRG PflflGRCGGIG GCGCBTTCRG 
CBICGBCCCT GBKMRCTCB tCRRGCRBGI CORGWRGTC GCRCGRCRGR RflGBRflTTGR CCGCKIRRG GCCCGIRARG BRCGTCRCGR GGGCCGCT7R 
GRGGCRCGCR GRTKflfiflCG .CGCRRCCGC PRGGCRCGTR RRGCRCRCRR RGCIBRGCGC GRRRGRRTGC I.GCIGCCTG GCGRIGGGCT GRRCGTCRRG 
RRCGGCGTflfi CCBTGRGGIR KIG1RGRIG TRCTBGGBBC RRCCRR1BRC KTRIGC1CI GGGICBBCR! GITCTCTGGG GACTTTRRGG CGCTICRCGR 
RCORB.CGCG CTGOC10K G1BR1GCTGO CCGCBTGGCT R1CGC1BRIG CICTtKGCI CBRCBIIGRI RBGCRBCnC RCGCBRTGT1 BfitCCGCTCfi 
tRGTCITBTC TTfiCROGKfl TCIGCGGGTG GCCIGflfllflG GIRCGRnifi CTRRCTGGRB GRGGCBCTRR RTGRRCBCGR ITfiRCfilCGC IRRGBRCGBT 
TICICIGfltB TCGRfiCTDGC IGCIBTCtCG ITCBBCRCTt IGGC1GRCCR MRCCGIGRG CGTI1RGCTC CCGBRXRGTl GGCCCITGRG CRIGBGTCH 
RXGRGRTGGG TGfiflKBCGC I1CCKMGR TCniGBGCG ICRBCHRRB GC1GG1GRGG IIGCGGRTBR CGCICCCCCt BflGCClClCfl TCRC1BCCC1 
RC.CCC1BRG RIGR1TGCAC GCRTCRBCGR CTGGTTTGRG GRRCIGBRRG C1RRGCGCGG CRRGCGCCCG RCBGCCIICC BCnCCTGCB RGRBBICBBC- 
CCGGBRGCCG TBGCGIRtfi, CBCCBIIBRG RCCRCTCFGG C.CCC.BRX CRG1GCGBC BBIRCRRCCG 1TCBGGCTG1 RGCRRGCGCR RTCGGTCGGG 
CCRTTGRGGB CGBCGCICCC MCGGKGTB ICCCIGBCC1 TGRBGCIBRG CBCnCBROB MDKCT10R CGWBRCTC ARCRBGCGCG TBGGGCRCGT 
CIRCBRGRRR GCB1ITBTCC RBGTTGTCGR GGCTGBCRTG CCTCTBBGC GTCTRC1CGG TGGCGRGGCG IGGICIICG! CGCfilRRGGB RGRCICIRII 
CRIGTBGCBG TRCKTGCB1 CCBCRIGCTC finCRGTCRB CCCGRRIGGT .RGCIIRCBC CGCCRRBR.G CIGGCGIW1 RGCTCRftGRC tCICBGBCTB 
1CCBBCICCC KCTGflBlflC GCTGBGGCTB TCGCBRCCCG 1CCBCCIGCG CIGCC.GGCR TC1C1CCGB1 OUCCMX1 ICCG1BGTTC CTCCtRRGCC 
GTGGRCTGGC RTIRCTGGTG G1GGCTBTIG GGCTRRCGG1 CGICGICCTC 1GGCGCTGG1 CCGIBCtCBC BGIBRGBRRG CfiCTGfi.GCG CIRCGRRGflC 
CTTIRCBTCC CIGRCCtCIB CRRRGCGfil 1 RRCRTTGCGC R««BC«CGC R.GGRRRRK BRCBRGBRBG .CCIRGCCGT CGCCBRCGTB R1CBCCBRC! 
GGPflGCBTIC TCCWTCGRG GBCB1CCCIC CGBUGBGCG ICflflGRRCK CCGfllGRRRC CGGBRCRtRl CGRCRIGBRT CCTGRGGCIC ICRCCGCGIC 
GRBRCGtGCT GCCGCTGCIG tGIRCCKRR CfiCRRCKTC GtRRGIClCG CCCIRICRGC CTTGRGTTCB TGCITGBGCR BGCCRRIRRG UIKIKCC 
B1RBGCCCRT C1GGHCCC. TRCRRCRIGG RC.GGCGCGC TICC1GM1R CCCICIGJCP R1GUCBBCC CGCRRGGIBB CGBTR1GRCC RRBGGflCC.C 
MBCCCIGGC BWBGClBftR CCBBICGGTB BOGBTOHB C1RCTGGC1G RftftRTCCRCG GTGCBRRCTC TXGGGIGTC GRTBRGGTT1 CGII.CC1GR 
GCCCBtCRBC UCBITCBGC RBBRCCBCCfl GRRCfllCRIG GCHGCGCIR RGK1CCRC1 GGBGWBCI IGGIGGGC1G RGCRRGB1IC ICCGT1C1CC 
tTCCl.GCG. TCtGCTIIGR GlflCGCIGCC ClflCBCCBCt PXCGCC1GBG C1B1RACTGC .CCCUCCGC ICGCGIITCB CCGGtCIIGC .ClCttBICC 
BKBCTICK CGCGBTGC1C CGRGB1CBDG 1BGGIGCICG CGCGGI lftftC MCCTlCCm CICRRRCCG1 ICBGGRCR.C IBCGGGRT TG IIGCIBRCRR 
RGTCBBCGRG BUC1GCBBG CRGRCGCBRT CBRICCGRCC Gfl.RRCGBBG IBCIlBttC. GRCCGBIGBG RRCBC.GGIG RPR1CICGB CRRBGtCRRG 
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CICCCCflCTB RGCtflCTGGC rCCTCRRTGG CTCCCTTRCG GTGHRCTCG CflCTCTCPCT RRGCGT TCRG TCRTGRCOCT CGCTTRCGGG TCCflRRCRCT 
TCCGCTTCCG rCBRCfWCTG CTGGRRGflTfl CCflTTCPCCC RCCTRMGRT rcCGGCRRGO C^CTCPTGTT CRCTCRGCCG RflfCflGGCTG CTGCRTRCRT 
GCCTflRGCTG RTTTCCCPflT CCCTCRGCGT GRCGCTGGTR GCTGCGGTTG RRGCRRTGnR CTGCCTTRPG rCTCCTGCTfl RCCTGCTGGC TGCTGRGGTC 
RflflGflTflfiGH PGRCTGCBGP GPTTCTTCGC RRCCGTTGCG CTGTGCRTTG GGTRRCTCCT GRTGGTTTCC CTCTGTGGCfl GGRRTRCRRG RRGCCTRTTC 
RGfiCXGCTT GRflCCTGRTG TTCCTCGGTC RGTTCCGCTT RCRCCCTRCC RTTRRCRCCR RCflfiRGRTRG CGRGRTTGAT CCRCRCflflftC RGGRGTCTGG 
TRTCGCTCCT RRCTTTGTflC RCRGCCRRGfl CGGTRGCCRC CTTCGTfiflGR CTGTRGTGTG GGCRCRCGfiG RRGTRCGCRR ICGflfiTCTTT TGCRCTGRTT 
CPCGRCTCCT TCGGTRCCRT rcCCCCTGRC GCTGCGRRCC TGTICflflRGC RGTGCGCGOR RCTRTCCTTG RCRCflTRrCfl GTCTTGTGRT GTRCTGGCTG 
RTTTCTRCGR CCRGTTCGCT GRCCRGTTGC RCGRGTCTCR RTTCGRCRflfl RTGCCRCCRC TTCCGGCTRfl AGGTRfiCTTC RACCTCCGTG RCRTCTTRGR 
GTCGGRCTTC GCGTTCGCCT RflCGCCRRRT CRRTRCGRCT CRCTRTRGRG GGRCRRRCTC RPGGTCRTTC GCRRGRGTCG CCMTRTGRT TGRCCTTCTT 
CCGGTTRRTR CGRCTCRCTR rflGGROflflCC TTRRGGTTTR RCTTTRPGRC CCTTRRGTGT TRflTTRGftGR TTTRRflTTRR RGRflTTRCTfl RGRGRGGRCT 
TTRRGTRTCC GTRRCTTCGR RRRGRTGRCC RflflCCTTCTR flCCGTRRTGC TCGTCflCTTC GRGGCRRCCR RfiCGTCGCRR GTTGRRTRRG RCTRRGCGTG 
RCCGCTCTCfl CRRGCGTRGC TGGGRGGCTC RGTRflGfiTGG GflCCTTTRTR TRGTGGTRRT CTGGCRGCRT TCRRGCCflGC RRCRRRCRRC CTGITCCRCT 
TftCRCTTRGC GCTCRTTTRT GRTGRCTGGT RTGRTGCCTR TRCRRGRRflfl GRTfGCRTRC GCTTRCGTRT TGRGGRCRCG RGTGGRRRCC TGflTTGRTRC 
IRGCRCCTIC TRCCRCCRCG RCGRGGRCGT TCTGTTCRRT RTGTGTRCTG RTTGGTTGflR CCRTBTGTRT GRCCPGTTGR RGCRCTGGRfl GTRATRCGRC 
TCRGTRTRGG CfiCRRTGCTT RRGGTCGCTC rCTRGCRCTG GCCTTRGTCR TTTRRCCflflT RGCaCRTRRfl CRTTRTCRTG RfiCflTTRRGR CTRRCCCCTT 
TRRRCCCGTG TCTTTCGTRG RGTCTGCCRT TflfiGRRGGCT CTGGRTRRCG CTGGGTRTCT TRTCGCTGflfl RTCRRGTRCG RTCGTGTRCG CGGGRRCRTC 
TGCGTRGRCR RTRCTGCTRR CRGTTRCTGG CTCTCTCGTG TRTCTRRflRC GRTrCCGGCR CTGCRGCRCT TRRRCCGCTT TGRTGT TCGC TCGRRCCGrC 
TRCTGRRCGR TGRCCGTTGC nCTflCRRflG RTGGCTTTRT GCTTGRTGGG GRRCTCRTGG TCflRGGCCGT RGflCTTTRRC RCRCCGTCCG GCCTRCTGCG 
TRCCRRRTGG RCTGRCRCGR RGflflCCflRGfl GTTCCRTGflfl GRGTTRTTCG TTGRRCCflflT CCGTRRCfiflfl GRTRRRGTTC CCTTTRRGCT GCRCRCTGCfl 
CRCCTTCRCR TRflflflCTGTR CGCTRTCCTC CCGCTGCRCP TCGTGGRGTC TGGRGftRGRC TGIGRTGTCR TGRCGTTGCT CRTGCRGGRR CRCGTTRRGR 
RCRTGCTGCC TCTCCTRCRG GRRTRCTTCC CTGRRRTCGR RTGGCRRGCG GCTGRRTCTT RCGflCGTCTR CGRTRTGCTfl GRRCTRCRGC RRCTCTRCGR 
GCRGRRGCGR GCRGRRGGCC RTGRGGCTCT CRTTGTGRRfl GRCCCGRTGT GTflrCTRTRR GCGCGGTRflG RRRTCTGCCT GGTGGRRRRT GRRRCCTGRG 
RRCGRRGCTG RCGGTRTCRT TCRGGGTCTG GTRTGGGGTR CRRRRGGTCT GGCTRRTGflfl GGTRRRGTGfl TTGGTTTTGR GGTCCTTCTT GRGRCTGGTC 
GTTTRGTTflfl CGCCRCGflRT RTCTCTCGCG CCTTRRTGGR IGRGTTCRCT GRGRCRGTRA RRGTOCCRC CCTRRGTCRR TGGGGRTTU TTRGCCCRTR 
CGGTRTTCGC GRCRflCGftTG CTTGTRCTRT TRRCCCTTRC GflTGGCTGGG CGTGTCRRRT TRGCTRCRTG GRGGRRRCRC CTGRTGCCTC TTTGCGGCRC 
CCRTCGTTCG TflflTGTTCCG rGGCRCCGRG GRCRRCCCTC RRGflGRRRRT GTflflTCRCRC TGCCTCRCCT TCGGGTGCGC CTTTCTGCGT TTRTPflGGRG 
RCRCTTTRTG TTTRRGRRGG TTGGTRRRTT CCTTCCGGCT TTGGCRGCTR TCCTGRCGCT TGCGTRIRTT CTTGCGGTRT RCCCTCRRGT RCCRCTRGTR 
GTRGTTGGCG CTTGTTRCTT RGCGGCRGTG TGTGCUGCG TCTGGRGTRT RGTTflRCTGG TRRTflCGRCT CRCTRRRCCfl GGTRCRCRCC RTCRTGTRCT 
TRRTXCRTT RCTCflTCGTC RTTGTRCCRT GCCTTGCCCT CCRCTGTRGC GflTGRTGRTR TGCCPGRTGG TCRCGCTTRR TRCGOCTCRC TflflflGGRGRC 
RCTRTRTGTT TCGRCTTCRT TRCftflCRRRR GCGTTflRGflfl TTTCRCGGTT CGCCGTGCTG RCCGTTCRRT CGTRTGTGCG RGCGRCCGCC GflGCTfiflGRT 
RCCTCTTRTT XTRRCRCRG TTCCTTTGGC RCCGRGCGTC CRCRTCRTTR TCRCCCGTGG TGRCTTTGRG RRRGCRRTRG RCRflGRRRCG rCCGGTTCTT 
RGTGTGGCRG TGRCCCGCTT CCCGTTCGTC CGTCTGTTRC TCflflflCGRRT CRRGGRGGTG TTCTGRTGGG RCTGTTRGAT GCTCRRGCCT GGGflfWRRGfl 
RRRCCCCCCR GTRCRRGCRR CTGCGTGTRT RGCTTGCTTfl GRGRRRGRTG RCCGTTRTCC SCRCRCCTGT RflCRRRGGRG CTRRCGRTRT GRCCGRRCGT 
GflRCRRGfiGR TGRTCRTTRR GTTGRIRGRC RRTRRTGRRG GICGCCCRGR TGRTTTGflflT GGCTCCGGTfl TICICTGCTC CRRTGTCCCT TGCCRCCTCT 
GCCCCGCfWR TRRCGRTCRR RRGRTRRCCT TRGGTGRART CCGRGCGRTG GRCCCflCCTR RflCZRCRTCT GRRTRRRCCT GRGGTRRCTC CTRCRGRTGR 
CCRGCCTTCC GCTGRCfiCRR rCGRRGGTGT CRCTRRGCO rCCCRCTRCR IGCTCTTTGR CGflCRTTGRG GCTflrCGfiRG rGRTTGCTCG TTCRfiTGRCC 
GTTGRGCRGT TCRRGGCflTR CTGCTTCGGT RRCRTCTTRR RGIRCRGRCT RCGTGCTGGT RRGPfl&TCRG RGTTRGCGTR CTTflGRGRRR GflCCTRGCCR 
RRGCRGRCn CTRTRRflGRR CTCTTTGRGR RfiCRTRflGCR rflfiRTGTTRT GCRTRRCT TC RGGTCRRCCC CRCCTGCCCR CRGCCTRICT GRTGRCTTCR 
CRTCTTGCTC RGRGTGGTGC CGOflRGRTGT GGGRRGRCRC RITCGRCGflT GCGTRCRTCR RGC'GTRTGR RCTTTGCRflfl TCGRCRGGTC RRTGRCTRTG 
TCRRRCGTRR RTRCRGCnC RCTTRGTGTG GRCRRTflflGR RGTTTTGGGC TRCCGTRGflG rCCTCGCRGC RTTCCTICCR CGTTCCRRTC TRCGCTGRGR 
CCCTRGRCGR RGCTCTGGRG TTRGCCGRRT GGCRPTRCGT fCCGGCTGGC ITTGRGGTTR C~C"GTGCG TCCTGTGTR GCRCCGRRCT RRTRCGRCTC 
flCTflTTRGGG RRGRCTCCCT CTGUGftfWCC RRRCGORRCC TflflfiGGRGRT TflflCRTTRTG GC'acioflflGfl TmCflCdC TGCGCTCGGI RCCGCTGflflC 
CTTRCGCTTfl CRfCaCRRG CCGGRCTRCG GCRRCGRRCR GCGTGGCTTT GGGRRCCCIC GTGGTGTCTR IRRRGTTGRC CTGRCTRT IC CCRRCRfifiGfl 
CCCGCGCTGC CRGCGTRTGG rCGRTGRRflT CGTGRRGTGT CRCGAPGRCG CTTflTGCIGC TGCCSTTCRG GRflfflCGRRG CTflflTCCRCC ICCTGTRGCT 
CGTGGTRRGR RRCCGCTCRR RCCGTRfGRG GCTQPCRTGC CGITCTTCGR TflflCGGTGRC GCTFCGRCTfl CCTTTRRGfT CRRRTGCTRC CCGTCTTTCC 
RflGRCflfiGRfl GfiCCflflRGRG RCCRRCCRCR TCRRICTGGT rCTGGITGRC TCRfififiGGTR PCaPGRTGGfl RGRCCTTCCG RTTRTCGGIG GTGGCTCTPP 
GCTGRRflCrT RflfiTRTTCTC IGGTTCCRTR CRRGTCGflRC RCTCCtGtRG GTGCGRGCGT TRRCCTCCRfl CTGGRflrcCG TGRTG06GT CGRRCTCCCT 
RCCTirCCTG GCGCTCRRCfl CGRTTGCGCT GflCGRPGITG flRCOGRflCGG CTRTCTTGCC TCTCGITCTG CCtWRCCGPG-tTRRPCCRCGC G«CCRflCfiflft 
GCTGGGRCGR RGRCGRCGRR GRGTCCGRGG RRGCRGRCGR RGRCGGRGRC TTCTRRGTGG RPCTGCGGGR GRMRTCC T T GRGCGRRTCR RGGTGRCTTC 
CTCTGGCTGT TGGGRGTGGC RGGGCGCTRC GRRCRRTRflfl GGCTRCGGCC RGGTGTGGTG CPCCRRTRCC GGRRRGGTTG TCTRC!GTCR TCGCGTRRTG 

Fu:. 1(b) 



NUCLEOTIDE SEQUENCE OF T7 DNA 



481 



nZZ, K ,RCCC,C£I ° OC,CC,CTC mW " KB " B,C,TC,,¥C tt,MC " TRTCCRTflCG «CTCC««. Wen, 

krc™ ™™ eCCIC " CfiCB fCGCC "" fifi " CT " owc «*™« owccic 

S ™ "J^™ BfiCRCaC "" «*»»«« CP«THCCG CGCT^ RIC CG«»GG TTCCRGCCTT 

ZcZZ J * ^'"^ aBG,,CCnB ,CBfl, " CG "< <=tr,g**«; too^tk c—, ,ccggc4 

3500 • ST ^ CTIC,,BCT ' tt "™ m £ ™»~ ""««« "~ 

™ CCCCCSK,fl °' ,C, " CCCIfl T,C ' CT,CTC * m «"««• ."occJ 

£Z£ ™L c ;:zr nvcwH> ™"*" x °™ ,Ka n ~ ~» 

*» m» ICCICWPC mRRPCR* G.MHCHC .GRCGCRP.C ,„ BnD c, CC.CGGCTRC C««CCC«G, CRG^G.TG CTCICCOTW 

k Z7 mK RKa000!TC K ' C0BTGTC 00fl,RCCRC ' " B,CB,CBB aavmi ™»« ™«0«, 

=.«QGC.C.C RCGCMXK nflclclBICC ,,.„ fl « 

ooc c W , Kfl fiIC£C;lCK ICBCICCIIG , CBCflCTGC , 

COT.CWC CT,«CG„ «,«««, GRRCC«K,G o,«„C,C rccc.gc™ r,„ CTacilMB ZS 
"C^TC RR^CCGC RRGICUR.R RRCRRHCR TR^GGCTCCG RGGRGGCRtR ,Ct W ,CTC 

™ fitB ^ nnw , mc , [cclc , „ 

S « C ™ C ' ,CCC,COfl ,nmBCflBC T,KnT ^ TGTflTTTCTT 

Z GT(:G0S0, W,Cfl,G0Cfi Bt,C0C ' Cn C,C,CBtC0B -ciaiccw. 

f I'"**" CCBflBCCC,t BCWCC,BRB cc,,fl,G,c,, BCflBCCTC,c ™ ic «««« 

r^ZT; CCBaCW BfiCC,CTCRC B,,GCttCCC, ^'^'^ CBBBCre0 * <*~T *C,*C,GGC TGRCT«,CGG 

GCPRCRTTG, GRGTCRWRC GT.CGRGRTR RRGATRRG8R CTTR^C RCTGC.RGTC «C««™ CGCCGW CoSS 

" ~ m,C,CCT ' fl CBOnBOCTW ™«« ~» ~ TR1CC1GIRG ^£ 

CRCGGTGCC ICTGCCGCTR RGRRWCRIG CCC.OCCPRC TRCGRRTRCT TIOftCCflCIT CGRRCRGNTT RTCTTflRTCT TCGfl,mG« CORflGC^ 
CG^ TCGAncnGGC TGCRCftGGTT CTRCCCCC C1( «CG, B C C ^GCC*,, CT„ RGGPTGC^ „ 

2:ir CflBC,CT00B BT0CT0CTCC ,,0CflT,CC ' W,GCTC,eC TCT " "»»«• GftRCGflHTCC CTG^'c 
SOO IZrZ T ° Tt,,CT,n CRCT00CTGC flC,CC '"' C " CGTGGTGCTG men. GGTCRCTTCC 

nZ£ ™ ™ CTC,BCBBTC mm BTC0KW,W WW ' a " —ceo 

tIGBGGBGRC COCIGflGGfiC CTIfltflGCTC THCHCRRCCC TGTCCGSCTG AGflOWTCCG BCICBCTBBP «,„ fi ,TO«WX GTBRGTTt™ 

™ tcoocwccb tbcottccbt c,R " ,ToflcT °" TcttcoB -~ — ™ ™ ■ 

^ ST rc,SBTCBI, Crfl0BCCBCB ' CTCBBTCCT CC,fl,CCKT TCTG0TO " , Ca '" OCCC TftW ™" GRCftftCCTGfi 
IGRCWGCl W«GGG„C GCT«CI W CTCGGGTGG1 GCTCGTCGTR fiTTTGTCRCC TTARGRRCGC ROWWCC, WJCRCR.c WCRRGGTCC 

SZS ITCGTflTTCT r""" ,CaCCBCTB CTGRTRCTBT IBTtG CCC„ ™ C ^ W SSS 

CTICTCCTCG ITCGTflTTCT CRRGTGCCCC ItTPXTGOIC R1RCT0GTR1 CKTGGCTRC RTGGPRTRCA RC«««« CGWTGCCII GRRCCRTC* 
"I* 1 ™ ™«™ RGTCRRCfiGfi CTGGKCAflC «C«,«CT TCTGRCAGGR TTCTTGRTGP CTclc 

Z nCTBRTBC6 BC ' CBC,BnB tBTC,ICH "' CTW,T,( ^ CGTRTGTACR 

«CG.GG*« CRfiGCGflCG RRCGR.CRGC CICGIR^RW GGCCRCRCRG TCWGCGCTC TGCCCRTICC CTCC«CB* CIGGCTGRCO G.GCRTCCRC 

ZI'^ B " a, " a C,C,B0CWfi -«»c W «»««cc SSS 

^TfiTCCC. CTOTCCGRTR CRGCTWCCR BBWU CGBGICCRC*, TCARCGTTCC CRRCGO.RRC GCGRCTRTGG ITTftCCGCTG 0M0K1CI 

2 TCRKCTBTC KCn ™ K S,CRGCBRK ' C,GCe,C,C CTCW,TKK 'CCCG,** GCRATTCRTG 

B ZT '"'^ m,,a>1 ^ """'^ 'CGC««C, CGTGGCCC W «TRC,P 

», «™ ^ C,0BCCC,flB BC, "" CCBB CCBC,,flfiTCC "^"^ <«^'o namsin. mmm » 

TTCBCBICn ° RflCflRC,,R C0C,CCRC " ° WRWCB "C-OGCTGG ^ RRG1 TRCCRR ■ GCf GAGTCflT 
RRGGTGTTCG fllRRGCRCfC T.TGWCCRC RTCRAGGCTG CIGRIIGGG. IRRCGTTCRC R,CGG«GT,C TTGGTGGRTR CGGCTRCOTC CCCICRGITR 
GTGGCflnRTR TGCRCRRGIG ICfilRCRTCP CRCG.GTTRC TCCRCGCGG1 GCRRTCGTTG CCGRTRRGRC CflfitflTGflT7 CROCWG.T TCUGRCRC, 
GRRCRGRHG TIARG7CRCG RTAATCAATA GGRGRRR.CP mmmcc. t.CTCPCRTC GR*C,«« CCCTC.RB. GRGCGTCRC1 
«GUCC«C. KGCCCTim C.RCCRCIRC TCCRCCCCtC RGTRCGTRRG CIRCCC.CCG RGIGWIICC CTCCGIRICI GGATGCGCTG GRRGCCGflGG 

^' C,TB " C1C " CCBCB RCCCTCRCfta C "" WCCn CC ' CC, " W aBflBC «« *"cc*ci 

^ fl ]r° BCBtCC,TC ' C,TC,CBCC, nGBTTCB " C ^ R£C ° )t , »« :CI "" TGCGT TCCGG C W „ KC C 

«««CGCT „GGC,C,CR CCCIttCGRG GCGTGGCGTT RTCGCT1RGG C««TG«G GGTGRRTncR ««««:„ t<TOa(n6 CIIKWCfia 

^TWPG* RTRCGTTGRC GGRR1CGRGI GG1GGRRCT1 CARCGWGAG RTGRTGGRCl R.RRCCT.CR GOCCG.C CKTCCC 

G-GCIWIC ICTCRCRRRC mwiICtt ICCIORam CRCT.tRCGG RCGfRGGRTR OCTRCCTIC 1GGTCRGRR1 CCCMMCtt C6TTGRCRIT 
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15000 CflfiCRTCCTC CTGCRTCGCT GCTCGCram CPPGRGCGCR RCGGCUCCC GTTTQCfCR RRRGCRRTCG «RCRCTT C m CGTPGPGITp CcrcCTCCtt 
CCTCTCRCTT GCTCCGTRPR fTGRCCGflfifi CCTTCGCCIC GTCCTRTCRG CCTRRRCCTC CCRCTCRCRT CTTCTCCCRT CCttGPPCRC G-RPCCCPtT 
RCCTRflflTRC CCTCCCRTTR PGRCRCCTRfl RGTTCCTCCT RTCTITRRCR RGCCTRPGPP CRRCCCRCRG CCRCRRCCCC GTCPGCCTTC CCWC'TCRT 
RTCCGCCRGT RCGTTGCTGG TCCTCCTTRC RCCCCRGTTG RfiCflTGTTGT GTTTRRCCCT rCGTCTCGTG RCCRCRTTCR GWOWCTC C«OiCCCTG 
GGTGCGTCCC GRCCRRGTRC RCCGRTRPGG GTGCTCCTGT GGTGGRCGRT GRGCTRCTCG PRGGPGTRCG TCTPGPTCRC CCTOWCC RPCCCGCTRT 
15500 CC «*TCflTT RRRCRGTRCT TGRTGRTTCR GPflCCGPRTC GGRCRGTCTG CTG«GGGPGP CRRRGCRTGG CTTCGTTATG TTGCTGBGGR TGGTPflGRrT 
CRTGGTTCTG TTflRCCCTRR TGGRGCRGTT RCCCGTCGTG CGRCCCRTGC G7TCCCRRRC CTTGCGCRRR TTCCCCCTGT RCCTTCTCCT rflTCCPCBGC 
RCTGTCGCGC TGCTTT7GGC CCTGRCCRCC RTTTGGRTGG GRFRRCrGGT RRGCCTTGGG riCRGGCTGG CRTCGRCGCR TCCCGTCTTC RGC-^GCTG 
CTTGGCTCRC TTCRTGCCTC GCTTTGRTRfl CGGCGRGTRC GCTCRCGRGfl TTCTTRRCGG CGRCRTCCRC RCTRPGfWCC RCPTRCCTGC r(?¥C~ftCCT 
flCCCCRGRIR RCCCTRRGRC GTTCRTCTPT GGGTTCCTCT RTGGIGCTGG TGRTGPGRRG RTTGGRCRGR TTGTTGCTGC TGGTRPPGPG CGOGTPACC 
RRCTCRRGRR GRRRTTCCTT -GRGRRCRCCC CCGCGRTTGC RGCRCTCCCC GRGICTRTCC RRCRGRCRCI TGTCGRGTCC r C TCRPTGCG TRGC'GGTGfl 
GCRRCRRCTC RRGTCCflRRC GCCCCTGCar TRRRCCTCTG GRTCGfCGTR RGGTRCRCGT TCGTRGTCCT CRCGCTCCCT T W TRCCCT RCTGCPflTCT 
GCTGGTGCTC TCRTCTGCRR RCTGTGGRTT RTCRflCPCCG RRGAGRTGCT CGTRGRGRRR GGCTTGRRGC RTGGCTGCGP TGGCCPCTTT GCGTPCRTGC 
CRTGGGTRCR TGRTGRftRTC CRRGTRGGCT GCCGTPCCGP QGRGPTTGCT CRGGTGGTCR TTGRGRCCGC RCflflGRfiGCG QTCCGCTCGG TTGCWCR 
• CTGGRfiCTTC CGCTGTCTTC TGGRTRCCGfl RGGTRRGRTG GGTCCTRRTT GGGCGRTTTG CCRCTGRTRC RGGRGGCTRC TCPTGSPCGA PRCPCC'TR 
16500 RCRGGTGCTG CTTCTGPRRT GCTRGTRGCC TRCRRRTTTA CCRRRGCTCG GTRCRCTCrc TPTTRCCCTR TGCTGPCTCP GRGTRPPGPG GPC"GGTTG 
TRTGTRRGGR TCGTRPRTTT ROTRRGGTTC RGGTTRflflRC RGCCRCRRCG GTTCRRRCCR RCRCRGCRGR TGCCRRGCRG GTTRCGCTPC GTGCPTCCGG 
TRGGTCCGflfl TRTRPGCRTG GRGRCTTTGfl CRTTCTTCCG GTTCTGGTTG RCGRRGRTGT GCTTRTTTTC RCRTGCCRCC PflCTRfWRGG Tfi**CflTCC 
RTCTGTGTCG GtflRGRGRRR CRRRGCCflrR RRRCTRTRGG RGRRRTTRTT RTGGCTRTGR CRRRGRRRTT TRRRGTGTCC TTCGflCCTTP CCCO**CPT 
GTCGTCTGRC GTTCRGGCRfi TCTTRGPGRR RGRTPTGCTC CRTCTRTGTR flCCRGGTCGG C7CRGGTGCG RTTGTCCCCP RTGGTRR9CR GPJraWTG 
17000 RTTCTCCRGT FCCTGRCRCR CGGTflTGGRfi GGRTTGRTGR CPTTC&TR&T RCGTRCRTCP TTTCGTGRGG CCRTTPPGGP CflTGCRCGPR GPGTBTGCRG 
- RTRRGGRCTC TTTCRPRCRR TCTCCTCCRR CRGTRCGGGR GGTGMCTGR T C T C TGRCTR CCTGRRRGTG CTGCRACO* TCPHRAGTTG DXTPPCJCT 
TTCCRGTCCR RCTRTGTRCG GRRCRRTGCG RGCCTCGTRG CGGRCGCCGC TTCCCCTGGT CRCRTCTCCT GCCTCPCTRC TflGTGGPCGT PTOJTGGCC 
CTTGGGRRflT CRCTGCTTCC GGTRCTCGCT TTCTGRRPCG RflrCGCRGGR TGTCTCTRRT GTCKGTGRC CTTGTCflCTfi TTCCBCGCGfl TGTGTCGWC 
" GflTRTRCRGG XTRCRTCGR CTCTCTGGRR CGTGRGRPCG RTRGCCTTRR GRRTCRRCTfl RTGGRRGCTG RCGflflTRCGT ACCGGRACTfl GPGUO W 
17500 TTRRTGGCflC TCTTGRCCTT RflRCRRTTCT RTGRGTTflCG T&RflGCCTGC GRCGRCRRGG CTRTCCTTGT GRTGGRCGGC GPCTGGCTG& TCTTOWGC 
TRTCRGTGCT GCTCRGTTTG RTGCCTCTTG GCRGGRRGPG RUTCGCRCC GRTGCTGTGfl CCRCGCTRRG GCCCGTCRG* TTCTTGRGGR TTCCRTTORG 
TCCTRCGfiGfl CCCGTRPGRR GGCTTGGGCfl GGTGCTCCflfl TTGKCTTCC GTTCRCCGRT RGTGTTRRCT GCCGTRflRCP RCTCGTTGRC CCGWCTOTR 
RGCCTRRCCG TRRGGCCGTG RRGRRPCCTG IRGGGTRCTT TOflGTTCCTT GRTGCTCTCT TTGRGCGCGR RGRGTTCTRT TGCflTCCGTG RGCCTRTGCT 
TGRGGGTGRT GRCGTTRTGG GRGTTRTTGC TTCCRRTCCG TCTGCCTTCG.GT&CTCGTRP GGCTGTflflTC RTCTCTTGCG PTflflGGRCfT rwcCCRIC 
18000 CCTRRCTGTG RCTTCCTGTC GTGTRCCRCT GGTRRCRTCC r&RC TCRGRC CGWRGTCC GCTGRCTGGT GGCRCCTCTT CCRGRCCRTC R«CCCTG«CB 
TCRCTGRTGG TTRCTCRGGG RTTCCTGGRT GGCGTWTRC CGCC&PGGRC TTCTTGRRTR RCCCGTTCRT RRCCGRCCCT PflPflCGTCTG TCCTTWCTC 
CGCTRRGRRC RRRGGCCflflC RGCTTRCTPP RTGGGTTRflfl CGC&RCCCTG RGCCTCRTGR GRCGCTTTGG GRCTGCRTTR PGTCCRTTGC CCCGTOCCT 
GGTRTGPCtG RRGRGGRTRT TRTCRRCCRG GGttflPPTGG CTC&RRTCCT RCGGTTCRRC GRGTRCRRCT TTRTTGPCflfi GGRGRTTTRC CTCrGGPCHC 
CGTRGCGTRT RTTGGTCTGG GTCTTTGTGT TCTCGGRGTG TCCCICRTTT CGT&GGGCC7 TTGGGRCTTfl GCCRCRRTRR TCRPCTCGTT PCRCGftHCT 
18500 RRGTGRTRRR CTCRRGGTCC CTRRRTTRRT RCGRCTCRCT PTR&CGRGPT RGCCGCCTT7 QCGPTTRTTR CTTTPPGPTT rppcTCTRRC PGGP*TCTTT 
RTTflTGTTRR CRCCTRTTRR CCRRTTRCTT RRGPRCCCTR RCGRTRTTCC RGRTGTPCCT CGTGCRRCCG CTGRGTflT C T PCWTCGR T TCflw:TnTC 
CGTRCCTCGR RGCGTCTGCT CRTRTflGGRC rTRTGCGTGC TRRTCGUGT ^TGRCGCCC aCPlCHGGG TTT C RrT C PG GCCCTRCRGT qtgCC'CTRR 
CGTCRTTCRC GRGRTTGRGT TflCGCRRGGR RCRfiCTRRGR WT&PIGGGG ^GGRTTGRCR CTRTGTGTTT CTCRCCCfWP ^ttpriwctc CCWO»rGCfl 
TRCCRRTCRG RTTCGAGCCG UGRGCCRGC GCCTdGRCC CRRGRRGTGT -PQGCGTCGP GT"CGC T GGG K'.K^O ^XRiac CGRGGGCRCC 
19000 GRRCTGTCrG GRCGCRRRGG CCICRRGGTC GRRCGTGRIG ^: C CG;«GC GPRGTCTPRP 3CC«GC0CCP ^COZ'ZZX "iTCTGARR TCTTC-RTCC 
GTRRGTCCGC RTTTGGRGGT RRGRRGTGRf GTCTCRCTTC RC.3IGIGTGG RGGCTRRGRG "CGCTTCCCI CCRRTCCGC-" X=CTGTGGR ^CRCCTTGGG 
TTGCCrflflRG GRTTCGRRGC RCRCTTTGrC GGCTRCRGCC TC T PCG<RGfi CORRGTCRTC CRCRrCTCTG GTTGCCG'GR ^JTRCRTT CTGGHTTCTR 
CCGGRRARCR TGTRGCGTRC TTCGCGTGGT GCGTRRGCTG tgrcPTTCRC CPCflRRCGOG RCancTG&fl IGTRPCG'CC Z~ "G^CP • " a ^TCZr^CCC 
ftGRCTCTRRG GGCTTRCRGC GRTTCCIRGC GRRRCGC IT t RPOIPCCTTG CGGRRCTCCR CGPT'CCGRT TGGGTG'C'C r^TRRCCP ^CPPGSCGPG 
19500 aCfiATGCCTG «TRCTTI» GGRGGTRTRR GTTRTGGGTR WWTQP GRRGGCCGTG ^CRRRGTCR CCRflCrCCG' '*<W*G1C GT'RPGGRPG 
GGGCTCGTCC GGTTRftflCflC CTTGCTGGCG GrCTRGCIGG TOGGCTGOT XTRCTGGIG RRGCRCRGRT XTGGARGIR CC=C^&CTG CCGOCRCfiT 
TGTTGRCCTfl CCTCRGRRRG RGGTTTCCRC rcflCCPCGRR GCRCRGRCPG RPRGCGCRCG CRRGPRRGCT CGTOCTGGCG GTPPGRRRTC CTTGPG'GTfl 
GCCCCTRGCT CCCCrCGCGG fflTCRRCRTT TflflTCRGGRG GTTRICGTGG RRGRCTGCRT TGRRIGGRCC GGRGGTG'CR ^TCTfiRC6G-:T*PTGCTCGT 
RftGTGCCUfl RTGGTRRRCT TGTCRCTCCR CRTRGGCRC« TCTRIGRGGR GRCRTRIGGT CCRGTTCCRR CRGORRITCT -ffTOTGCPT PTCTGCGPTR 
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RCCCCHBIRC TPCPGCGTSP «Cp ICCacB JSS <WCTCTfl, ° WCBC ' CB 

«»c«cc CI ? ,cccT«ic pcpcpcgcgc ICWfiBtIK ^ ' CBBCflccac tcictctric «xcm«* 

c^tcc'S,^, cccccacc" c£Z B ™ ™' C m,e " CCC, ~«« ™«nc ■pcpcp™ 
I*™-™ topp^ cncnaclGB ™ «« ««««, .cccrmccp cpcnccmc «*c„pctp 

RcnRccccc ,cS££ ^' CCB,C Bccccc,c,t Cfl ' co "^ «"»c«tc« -cppcpcp, 

,««,ct R GW ,ci, ,z ^ ,cG,fla,cc ,wc,cc,c c,c,,ffc,Bc «<™«« 

«:m W ,o. w , cioc £^ ;;^=' B « ««*.« aiIICGTlt 

«c«=™ a n,c,« £££ ;ZT CCC "" C,flC C " C,WB " "«™ 

.CCCTGP,,, TBBCCIIGCC 2; r fcM0M " KT06BaK ™™™ COT0OCCT0C PCTOCacn 

"KPPTPCPP CmCGCCCPP CPGTCPTGC ^ GC,, " T0,,CB C,rCI0CTm ,CMC, °« 

cccccocrc c Z " IT™ ICK,CW,C0 TBT0KTCCB raBC "'OCCICP 

««*n cccncc, owk" „ZS C^Z; ™ r""* CIMIa,,C 

TGP.CTCCG ,CC TICCCn ammn ™"«« ««««> TftTRTGCPTC TTTTGGCCTC HC1CCDCTG 

«« tcnct 10WCKm ™™ ^ C "«'^ cccc. W ,oc ccTcmcc flTa>)CTW cctCRWCCn 

"GC^ITOC TCRGCcop, ^'"^ RGnCTCCRGR «»CTCCT R OCXCTCTC 

cc^ o^no, CCGCC ^ c ^ ^ ™»» ~ ccocc- ««T Bra ™ Tn 

C,C0T««C .CTPCO,^ COTOCPT, «^ o ^ ""^^ mflTCCCTGe IC,W ««' 

cpc^rccc, oopt^toco naKamc ormc T« « J' CC " C,CT,,Ifi flCCTCCT, ° B """^ aiw « iCT(; 

CCCflRCTCC, RCTGTCCP BICCIGCTm ^ ^ f""" 7 " *"«*TCG .CfiCTCTOC CCTP^COCCT TCCCTCGT* 

tppcc*™ , tt0ClK ^ aon r 0 " 0 "" ,BCflTflT0K TBCCflTOCT 

m*nac uocam, «na^ " « ct^dcctctt £GTTCKTO T *" a ™ 

^CTCOfiCC*. T^caPG O^mc^ pcpcc^L £S n " CTDC0TC0Cfi CTC ™ ""'"OOCT CCGCOO** 

ipccccnc cctctoZ ^ C ^T ^ r ' TCCTOBCC0C T0BCG,IC,C " T,,BT «" B 

TOT^COTOO «««« r( , tflfl ^ ^ ~ TOOC,^ IOCCCCrcTC 

CCCTCOGT^ oaCLPT, aGC ^ c ^ TftCCCTftflTT O^CPCTC «^««C C0OCn.CC CRCCRRCTTC 

CCCfiTTCICG C^C ^ ^ B " MWRC ' Bl °" C «« CtC ,T CWT CTWCC c w 

WOOTTOTW W TTCC«P a.CPCCK, ^ ^ ^ flfl00CT,CT '' «»««CTT, 

CT«r CW ocnc^ .JSS ~ S c^Z "'"'^ B0CBCCTC " ««« 
.^tocccc cctcgcc,c« PcccTccc.c * ° CTCC,CC,C C^tCOflC 

oo.T,„, a «««« «c,p,m K arcp,^ "SS ^ ^'^^ CCCCncCCC CCTCIftnwcc ooxcx,^ 

IWPXCTCCC CTRTCPRCGC TCC^CTO. CCC ^ C,CnC, "" C ^"CTCC CCTCPTCCO 

'««ci,c« c«SJ c« "" IC,C,,RC flB ° m,BflCC ° ,OW " ,CB 

c W n«c co^.ooc, ™ ~ ^ ™ IC Icr « Trccc 

CCPC.PCOP, OPOPTOCCTC ^CCHCCC „PCTCOP„ " ^™ ,a:,,,TTBC, C '°« C '""' "^CTCCC 
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2S000 TTCCWCflfl CCGTGCGTTA GGTCAftGCTC CGTRCRTCCR CCTGRTTRRC CCRCRTCRGC RCGRRCRGTR TTPCCCTCTC ITCfiCTCCTfl GCCCRRTCCC 
RGTGTTCGRC CTTTCTCGTR RCGRCflflTXfl RGTTRCGTRT CCTRRCCGTT CCflflCTACAT OWCflCCCCT flRTCC«CC7fl RCCRCCTCCG RRTCGTTftTT 
CTfiGCflCflCl RTRCGTTCRT CCTTflflCCGT flflCGTTGTTG CRCRGRRGRR CRCRRRGTCT GTCfWCTTRC CCRRTTRCW) CCCTRRTCRR CPCCCRTTCR 
TTRPCGTTCC TGGTGGTCAG TflTCGTRCCC ARCTRRTTGT RCACRTTRRC CGTRRRGRCG TTGCCRRGTR TflRGRTRCCft GRTCCTRGTC RRCCTGRRCR 
CGTRRRCRRT RCGGRTGCCC RRTGGTTRGt TGRRGRGTTR GtCRRGCRGfl TCCGCRCTRR CTTCTCTCRT TGCACTGTfifl RTGTRGCGCR RGCGTTCRTC 
35500 CfiTGTGRCCC CRCCTRGTCG TCRRCRGRTT GflCTCCTTCR COKTAmGR TCCCTRCGCfl GflCCRGTTGfl TTRRCCCTCT CfiCCCRCIftC CCT C «CTCCT 
ICTCTRRCCT CCCRCCTRAT GCTCCTRRCG GCTflCflTGGT CRRRRTCGTR GGCCRCGCCT CTRRGTCTGC CCACCflCTRT TACGTTCGCT RTGRCCCTCR 
GCGCRflRGTT TCC«CTGflCfl CTTTRGCTTG GRRCRCTGRG CRCCRRGTTC TRTOCCRRRC CRTGCCRCRC GCTCTTGTX GflGCCGCTCR CGCTARTTTC 
GRCTTCftftCT GCCTTCRGTG GTCTCCTRRG TCTTGTCGTG flCGTTGRCRC CRRCCCTTGG CCTTCTTTTG TTGGTTCWC TRTTRRCGRT GTCTTCTTCT 
TCCGTRRCCG CTTRCCRTTC CTTRGTGGGG RGRRCRTCRT RTTGRGTCGT RCRCCCflflRT RCTTCRRCTT CTRCCCTCCG TCCRITGCGfl RCCTTRGTCR 
26000 TGRCCRCCCT RTRCftCGTRG CTGTCRGTRC CRRCCGRRTR GCRfiTCCTTfl flCTRCCCCGT TCCGTTCTCR GflRGRGTTRC TCflTCTGGTC CCRTGRRCCR 
CRRTTCGTCC TCRCTGCCTC CCGTRCTCTC RCRTCTRflGT CGGTTGRGTT CflflCCTRRCG RCCCRGTTTG RCGTRCRCCfl CCGRGCGRGR CCTTTTGGGfl 
TTGGCCGTflfl TGTCTRCTTT GCTRGTCCGR GGTCCRGCTT CRCGTCCRTC CRCRGGTRCT RCGCTGTCCR GGRTGTCRGT TCCGTTRAGR RTGCTGRGGfl 
CflfTRCflTCfl CRCGTTCCTR ftCTRCRTCCC TRRTGGTGTG TTCRCrflTTT CCGCftftGTGC TRCGGfiRflflC TTCTGTTCGG TPCTRTCTCR CCCGCRCCCT 
RGTRRRflTCT ICRTOTRCflfl RTTCCTGTRC CTGflflCGRRG HOT TRRGGCR RCAGTCGTGG TCTCfllTGGG RCTTTCCGGfl RflftCGTRCPG GTTCTRCCTT 
26S00 GTCRGRGTRT CRGCTCRGRT RTGTRTGTCR TTCTTCCCRfl IGRGTTCflfiT RCGTTCCTRG CTRGRRTCTC TTTCRCTRRG RRCGCCRTTG RCTTRCflCCG 
AGRRCCCTRT CGTGCCTTTfl TGGRCRTGRfl GRTTCGRTRC RCGRTTCCTR GTGGRRCRTR CRRCGRTORC RCRTTCRCTR CCTCTRTTCR TRTTCCfWCfl 
RTTTRTGCTC CRRRCTTCGG GflGCGCCflftft RTCRCTCTRT TGGRGCCTGR FGCTRRORTR RCCGTGTTTG RCCAACCTfiC GGCTGGGTGG RRTRGCGRCC 
CTTGGCTGAC RCTCRGCGGT RRCTTCGRGG CRCGCRTGGT GTRCRTTGCG TTCRRCflUR RCTTCCTRTfl TGRGTTCTCT RRGTTCCTCR TCRRGCRGAC 
TGCCGRCOTC GGGTCTRCCT CCRCCCRRGfl CRTTGGGCGC TTRCRGTTRC GCCGRGCGTG GGTTRRCTRC GRGRRCTCTG 'GTRCCTTTGR CRTTTRTGTT 
27000 GRGRRCCRRT CGTCTRRCTG CRRGTACRCfl RTGGCTGGTG CCCGRTTRGG CTCTRRCACT CTGRGGGCTG CGflCRCTCflfl CTTRGGCflCC GCRCRRTRTC 
GRTTCCCTGT GGTTGGTRRC GCCARGTTCR RCflCTGTRTR CRTCTTGTCfl GRTCRGRCTR CCCCTCTGfifl CRTCRTTGCC TGTCCCTGCC RRGGTfiflCTR 
CTTfiCGGRGR flGTTCCCGTR TTTRRTTRRR TflTTCTCCCT GTGGTGGCTC GfiflflTTRRTR CGflCTCRCTfl TRGGGRCRRC RRTRCGflCTfl CGGGflGGGTT 
TTCTTRTGRT GRCTRTRRGR CCTRCTRRRR GTRCAGflCTT TGRGCTRTTC RCTCCGCCTC RCCRTCRCRT TCTTGRRGCT RAGGCTGCTG GTRTTGRGCC 
GRGTTTCCCT. GATGCTTCCG RGTGTGTCRC CrTGRCCCTC THTGGGTTCC CTCTRGCTRT CCGTGGTRRC TGCGGGGRCC RGTXTGGTT CGTTfltCRCt 
27500 CRCCRRCTGT GCCCRCTTRG TGGfifWGGCT PRGCGRRRGT TCCGTRRG7T RRTCRTGGRG TflTCGCGflTR RGRTGCTTGfl CTOTRTORT flOCTTTCGR 
RTTRCGTRTG GGTRGCCRRT RCGTCCCRCfl TTCGTTTCCT CRRGflCTRTC GGTGCCGTRT TCCPTCRRGR GTRCRCRCGfl WTGGTCflRT TTCRGTTRTT 
TRCAATCflCG RRRCCRGGRT RRCCRTRTGT GTTGGGCRGC CGCRRTRCCT RTCGCTATRT CTGGCGCTCR CGCTRTCRCI CCTCRCflflCG CTCRGGCCRA " 
RRTGflTTGCC GCTCRCRCCG CTGCTGGTCG TCGTCRRGCT RTGGflflflTCR TGfiGCCRGfiC GflflCRTCCRG RRTCCTGRCC TRTCGTTGOT RGCTCGfifiCT 
WCTTOBOC RRGCGTCCGC CGRCTTCRCC TCRCRGRRCR TCCRGRRGCT CCRRGCTRTT GGGTCTflTCC GRGCGGCTRT CGGflCRGRCT RTGCTTGflRG 
28000 GTTCCTCRRT GGRCCCCRTT RRGCGRGrCfl CWRGGRCfl GTTCRTTCGG GflflGCCflRTR TGGTRRCTCR GflflCTRTCX CGTCRCTRCC RRGCfWTCTr 
CGCRCOCCRR CTTGGTGGTR CTCRRRGTGC TGCRRGTCRG RTTGOCGRRR TCTRTRRGW CGRRCRCftRR OWWGRCTR RGCTRCPGRT GGTTCTCCRC 
CCRCTCXTR rCRTGCCCTC TTCCGCTGCG RGTGCTTRCG CRTCCGGTGC GTTCGRCTCT RRGTCCRCRR CTRRGCCflCC TRTTCTTGCC GCTflfWCOW 
CCORGflCCGG GRGGTRRTCfl GCTRTGRGTfl RRRTTGRRTC TGCCCTTCRR XGGCfCflftC CCCCRCTCTC TCCCTTRCGT KTCCTCCTG GfiCGTRTCGG 
CTRTCGTGCR GCOPCCPCTC RGGCCGRRCR CCCRRGGTCR RGCCTRTTGG RCfiCCRTTGC TCGGTTCXT RRGGCTGGTG CCGRTRTGTR TRCCGCTRRC 
28S00 GRRCRfiCGRG CfCGRGRCCT RGCTGRTGRR CGCTCTROCG RGRTTRTCCG TRRGCTGRCC CCTCRGCRRC GTCGRGflRCC ICTCRRCROC GGGRCCCTTC 
TGIRTCRGGR TGflCCCRTRC GCTRTCGRflC CRCTCCCRGT CRRGRCTGGT CGTRRCGCTG CGTRTCTTGT GGRCGRTGRC CTTRTXRCR RGflTflRnfiCfl 
GCGTGTCTTC CGTflCTCGCG RRGRCflTGGR RGflGTRTCGC CRTRGTCXC TTCRRGRGCG CCCTRRCGTR TRCGCTGflGC RGTTCGGCRT CCflCCdCOG 
GRCGTTGRTT RTCRGCGTCG nTCRRCGGC GRCRTTRCCG RCCCTRRCRT CTCCCTGTRT GGTCCGCRIG RTRRCTTCTT GRGCCRGCRR GCTCRGRRGG 
GCGCTRTCRT GRRCRCCCGR CTGCRRCTCR RCGGTGTCCT TCRRGRCCCT GflTflTCCTCC CTCGTCCRGfl CTCTGCTCRC TTCTTIGRCR RCTRTflTCGfi 
29000 CRRCGGTCTG GTTRCTGCCG CRRTCCCRTC TGRTGCTCRR KCRCRCRGC TTRTflflaCfl RCCCTTCRGT GRCGCTTCTR GCCGIGCTCC TGGTGCTGRC 
fTCCTGRTX CRGTCGGTGR CRRGRRGGIR RCflCTTRRCC GRGCCRCTRC GACTTRCCGfl GRGTTGRTTG GTGRGGRRCR CTCGRRCCCT CTCRTCGTCR 
CRGCRCRRCG ITCTCRGTTT GRCRCTGRCG CGRRGCTGRR CGRGCRGTRT CGCTTGRRGR TTRRCTCTGC GCTGRRCCRR CRGCRCCCRR GGRCRGCTTC 
GGRGRTGCTT CRRGGTRTCR RGGCTCRRCT RGflTRRGCTC CRRCCTGRTG RGCRGRTGRC RCCRCRRCGT COGTCCCTRfl TCTCCGCRCR GGRRCRflCTT 
CRGRRTCRGR TGRRCGCfllG GRCGRRRGCT CRGGCCRRGG CTCIGCRCGR TTCCRTOWG TCflfiTGRRCR RRCTrCRCGT RfllCGRCRRG CRRTTCCRGR 
29500 RGCGflflTCflfl CGGTGRGTGG GTCTCRRCGG RTTTTRRGCfl FRTXCRGTC RRCCRCRRCR CTGCTGRGTT CRRCCRIRCC GRTRTGGTTR RCTRCCCCflfl 
TRRGRfiGCTC GCTCOGRTTC RCRGTRTGGR CflTTCCRGRC GGTGCCRRGG RTCCTRTCOR GTTGRRGTRC CTTCfWGCCG RCTCTflflGCR CGGRGCRT FC 
CGTRCRGCCR TCGGRRCCRT GGTCPCTGRC GCTCGICROG AGTCCTCTCC CGCTCTCRTT RRCGGTRRGT TRCOOWCG RRCCCCRCCT RIGCRTGCIC 
TXCCRGRRT CCXRRTXT CRCCCTCRGT TGRTTGCTGC CCTRTRCCCfl GRCCRRCCTG RCCTflTTCH GflCCftTGCRC RTGRTOGRCR RGCRGGGTRT 
TGRCCCTCRG GTTRTTCTTG RTGCCGRCCG RCTGRCTGTT RRGCGGTCCR RRGRGCRRCG CITTGRGGRT GRTrWrT WCTCTX RCTGRRTGCR 

Fio. l{f) 
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TCTRRGGCTC CTCRWTTK CCCTRTCttR GCCTCRCTGC GCDRR7CTCC RCCTARGRT7 TR1GAC7CCG T1RPGTRTCC CTCGCCCRftt G*«ftCC«TGG 
CTRTGtWCCR CflTGflCCftlG TTCC1TRRGC RRTCTRCCTfl CRCGT7CRC7 GG7GR7GR7G TTGRCGGTCR TRCCGTTGGT GTGR7TCC7R RCRR7R7GR7 
CCACCTTfifiC TCTOCCCOft RPTCRTCGGR CCflflCCTCCG GR7R77CTGG OCCWGCRCC TRRGGGRRTC finCCCPCCB RCCC77GCR7 RRtCRPTflRG 
CRfltTGRCCfl TGTmTCTQR fiCCTCflCTCC RTTTRCCTTR TCCRCRCCRC RCGTCRRGTC RGRC7CC0R7 RCGRCRRRDfl CT1RCTCTCC RRGCTUGG* 
CTCIOraR^RCflRftCTC GRRGRCRflRG C7CGTGRGRP GCCTC7CGCT CR7CTCRRCR RGCCRGCRCC TRTRCTTCCC GCTRCCRRGG CC'GTGPRGt 
30500 ™CJGCT*P CGRGTCCGRG RGRRftCCTflR RCRCROCC7 RPG77CR7CT RCGGRCCTRR GCRGlflflCTP RRCGCTRCR7 PPGGRGGCCC TRPRTggri* 
RCTRCCRTRR GRRCCTRCCR RG1GR7TR7G RTGGTCTGTT CCRRRRGGC7 CUGRTGCCR RCGGGGTCK 7TRTCRCC77 71RCGTRSRG I^'TICCRC 
RCRRTCRCGfl TTTGTGCCTfl CRGCRRRR7C TRRGRC7GGR CCRTTRGCCfi TGPTGCRRU TRCCRRGGCfl RtCGCTRRGG CCZ-GGT^ GCC^Mflr. 
GRTCGTCCRG RCGRCGRCCG PCIGWCCCT DRGT7RGC7P TTRRTGCTGC CGCTRAGCRP C71GCRGG7C 7GGTftGGGRP C^i W ICG: GP W lli 
RRGCTCCCCT TGCGTRCRRC CRRGGCGRGC GRCGC11GCG TRRTCCRCRR UTGRGCCC! RCTCTRRGGG RGRCT7CGGP T-flftTC-TC- flKWGn- 
31000 TRRCTACRTG CGTPflCCTTC IGCmCTTCC TflRCTCTCd PTGGOGGPC RGT!DGRRRt 1UTGGTGCC R7RACCCCRR RGGGIWKGv -R-r^ 
GRGGTRGGRT 7GGCTGGRRI TGGICRCRflG CRCRRRGTRP CRCflOWC TCCTGRGTCC RtflflGMnc RCGTTRPGGG 1QTCCRRCR& ^ 
CGRRRCCRTl CGCCRRGGRC TIITCCORW CCCRCGGRCP mCRCMGflC GRGTRCRRCP G7CGI7 C «flC C77C77CGG* TICRRRPrtg -T t -'-W 
TGRRCTCTCC RflCTCRGTCC CTGGGRTGGC T11CCCICC? GGTCGTCTCG PTRR7GC771 1GR7CTC777 RWCRCRCCR TTRCGC:^ •C&'"' J 0flfl" 
TCTCRCRtCi GGRCTCCRGP GGRG7TRGRC RRGR7TCGRP CRGRGGT1RP GR*CCCTGCG TRCRTCRRCG HGTWCTGG ICC'l'l"' 7 r & 
31500 RTGRCCTCRT 7RRR77GGCT RRCCRGRRCT TTGRGRRTff CICCCGCCCt GCCGRGGCTG GCCTRGCTCC CRRRCTGRG1 GC7GG7P77* * "GG*G~ TGG 
TGTGGRCCCC CHRGCTRTG TTCCTRTGGT CGGTCTCRCT GG'RflGGGCl TT W T«m CPATMGGCT CTTGTRGTTG GTCCCCRRRG TK'V t — 
RRCGTTGCfll CCGRRGGTC7 CCGTRCCTCC G7RGC7CG7G G7GRCGCRGR C7R7GCGGG7 GCTGCCTTRG C7GCt7Ti G 7 G7T7GGCGC* GGcA-K 
CRRTCRGTGR CGCTGTRGC7 GC7GGRC7GP RRCGCRCTRP RCCRGRRGC* GRGTTCGRCR R7GRG77CRT CGG7CC7R7G mGCG^TGG RRy-^J 
GflCRGCflCGR RRCGCCRRC7 C7GCGGRCC7 C7CTCGGR7G RRCRC7GRGR RCPTGRRG77 7GRRGG7GRR CR7RR7GG7G 7CCCT7R7G* GGfi"'^ 
32000 RTRGRGRGRC GTGCCG7GG7 G7TRCRTGRT GGCTCCC77C 7RRG7GCRRG CRRCCCRR7C RPXCC7RRGR C7 C 7flflRflGR G77CTCCCRG GTT^r C 'TG 
RCRRGGCTGC GCCRGGflfl7C RRRC7CGC7G GG7TCRCCGR CR7TGGC7TG RRGRCC7TGG GG7C7GRCGR TGC7GRCR7C CG7RGRG7GC- :7P7r^ c , 
CGTTCaTCT CCTRCTCGTR TGCRG7CTGG 7GCCTCRGG7 RRGTTCGGTG CRRCRGCTTC TGRCR7CCR7 GRCRGRC7TC RTGG7RCTGP CC«&'GI«" 
7RTRRTGRC7 TG7RCRRPGC RRTG7CTGRC GC7RTCRPRG RCCCTGRG77 C7C7RC7GGC GGCGCTRRGR TG7CCCG7GR RCRRRC7CCP T«CfCT«7C7 
RCCCTRGRGC GGCRCTRGCT R7TGRGCGTC CRCPPCTRCR CRflGGCRC7C RC7CCG7C7G RGRGRRTCG7 TRTCGRCRTC R7TRRGCCTC RC7T1GRC* 
32S00 CRRGCGTGRR C77R7CGRRA RCCCRGCRR7 RTTCCGTRRC RCRRRGGC7G 7GRGTR7C77 CCC7GRGRGT CCCCRCRRRG GTRC7TRCG7 T C C7-«CGTR 
TRTGRCCG7C R7GCCRRDGC GCTGfl7GfiT7 CRRCGCTRCC GTGCCGRRGG 77TCCRGGRR GGGAT7GCCC GCTCR7GGR7 CRRCRGC7RC GT C 7;^ 
C7GRGG7CRR GGCCRGRGTC GR7GRGRTGC TTRRGGRRTT PCRCCGGGTG RRGCRRCTRfl CPXCRGRGR7 GG7RGRGRRG TRCXTR7GG R7RflGG r TTft 
TCGTRTCTCC CRCTCRGRCC RG7TCRCCRR CRGTTCCRTR mRGRRCRGR RCR7TGRGGG CT7RCTRGG7 RTCGRGRRTR RCTCR7TCC7 7GRGGCRCGt 
RRCTTCTTTG R7TCGGRCCT R7CCflTO«7 R7GCCRGRCG GRCRGCRR77' C7CRG7GRR7 GRCCTRRGGG RCTTCCR7R7 G7TCCGCR7C R7GCRGCGT 
33000 RTGRCCGCCG 7GTCRR7GG7 GRCRTCGCCR TCR7GG0GTC TRCTGCTRRR RCCRCTRRGG RflCTTRPGGR TGRGRTT7TG GCTC7CRRRG CGRRRGCTGR 
GGGRGRCX7 RRGRRGRCTG GCGRGGTRCR TttTTTRRTG GB7RCCGT7R RCRUCTTRC 7GG7CG7GC7 RGRCGCRR7C RGGRCRC7CT GT&GG«Rr C 
TCPCTCCCTG CCRTCflflTDR CCTRGGGTTC TTCCCTRRGR RCXCTRCR7 GGG7GC7CRG RRCRTTRCGG RGR7TGC7GG GRTGRT7G7C (CIGGtfi^G 
TTCCTGCTCT RGCGCRTGC7 RTCCCRflTTC TGCGTGRTRC (C7CTRCRRG TCTRflflCCRG TT7CRGCTRR GGRRC7CARG Cfi«ClCCRTG CG1MCTG" 
CGGGRRGGRG CTCGRCCRGT TGRTTCGGCC TRRRCCTGC7 0RCR1TGTGC RGCGCC7RRC CGRRGCRRCl GR7RCCGGRC C7GCCG1GGC GftfilOTCGTP 
33500 OGORCCTIGR RGTR7TCRRC RCRGGRRCtC GC7GC7CGC7 CTCCCTCCRC TRRGCTRCTG RflCGCflflCCR CTRRCTRCC7 7CTGGR7&C* GCCCGTCtWG 
G7R7GC77GC GGRTGTTRTT RGTGCCRCCC TRRCRGCTRR GRCTRCCCCC TGGGRGRRRG RRGCC7TCCT TCGTGGTGCC 7CCG7RRCU CTGRGC^CR^ 
GGCTGGCRTC RRGTCTCTCR TCRRGGRRCR TRTGGTRCGC CGTGRGGRCG GGRRG77TRC CGTTRRGGRC RRGCRRGCCT TCTC7R7GGR CCCRCGGCC7 
RTCGRCTTflT GGRGRCTCGC TCRCRRCCTR GCTGRTGROG OWTCCTCCG TCCRCRTRRG CTC7CCTTRC RGGRTTCCCR TCCCTTCCCP CCK^l* 
RGRTGGTTRT GCRG7TTRRG TCTTTCRCTR TCRRCTOXl TRRCTCTRRG 7TCC7XGRR CC77CTR7GR TGGR7RCRRG RRCRRCCGRG C^-O^GC 
34000 TGCGC7GRGC R7CflTCRCCT CTRTGGGTC1 CGCTGGTCGT TT:TflTGCTR 1GGC7XRCR CG7CRRRGCR TRCGCTC7GC CTflflGGRGRP RCC-T^OGac. 

TRCnCGRGC G7rcftCTGGR CCCRRCCR7G RTTCCCCRCG CTXG7TRTC TCCTRG77C7 CRR77GGG7G CTCC7T7CCC 7R7GC17GRC CTRG "?XTG 
G1G7TTTRGC G7 TCGR&TCC TCCRRCRTGG CTCCC7CTRC GRTTCTRCC7 RRCCRCRCCG TGRRGGRRCG TGRCCCRRRC RRRCCG7RCR ZZK'-W 
GC7RRTGGGC GC7R7GGGT1 CRRRCCTTC7 GDRflCPCRTG CCiTCCGCTG GCF7TG7GGC TRRCC7RGGG GC7RCC7TRR 1GRR7GC7GC T&GCG*Xt: 
RRCTCRCCTfl RTRRRCCflflC CGROCRGGRC TTCRTORCTG GTCTTRTCRR CTCCfiCRRflfi GRC7TRC7RC CGnRCGRCCC R71GRCTCRR Z^'^^ 
34SO0 TGRRGRTTTfl TGRGGCGRRC CGTCtTRRCT TGRGGGRGEG TRGGRRR7RR TRCGRCTCRC 7R7RGGGRGR GGCGRRRTRR TC7ICTCCC: CT^'J'*: 
RCRUTRCT7 TflflCCRCGTC RRR7GGC7RR CG7RR7TRRP PCCG7TTTGP C7 TRCCRGT1 RCR7GGCTCC RR7CG7GR77 T1RR1RTCCC G"'»G:a' 
CTRGCCCCTR RGTTCGTRG7 GC7RRCTCTT R7TGGTG7RG RCCGRORCG^ CUTflCCfl71 RflTRCRGRCT RfCGCTUa IRCRCGTRCT (CTp--- : , 
TDRCRRRCCC 7TGGGGTCCR GCCGR7GGC7 RCRCCRCCPT CCRCT7RCGT CGRG7RRCC7 CCRCTRCCGR CCGRTTGC1T GRC7TTRCGG PICC"'Wt 
CCTCCCCaC TRTCRCCTTR RCG7CCCTCfl GRTTCRRRCG mGCTOTRG CGGRRGRGGC CCGTGRCC7C RC7RCGGRTP C7R7CGC7C1 Z^W^ 

Fig. Kg! 
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35000 OCTCfiCTTX A7GCTCG7CG ICC7CCflRI7 C7GRRCC7RG CCflRCCCCGI CGR7CRCCCC GPTGCTGTTC CGTTTCGTCR RCTflflflGRCC RTGRRCCRGP 
RC7CR7CGCA RCCRCCTfiflT CfifiCCCTTflC ACT7CCGTflfl 7GRGGCTGRG flCirTCPGfW RCCRRGCGGR GGGCTTTPRG RRCGRGTCCR GTfiCCfifiCGC 
TRCGRRCACR OfiCCRGTGa GCGR7GAGAC CRRCCGT7TC CGftGflCCfiflG tCRfiCCCCTT CPOGPATPCC GC7GG7CRRT flCGCTftCPTC TGC7GGGRRC 
TCTGCTTCCG CTCCCCflTCfl fiTCTGfiGGTfl fifiCGCTGfiGfl RCTCTCCCRC RGCRTCCGCT PPCTCICCTC ATTTGGCRGR RCRGCRAGCR GfiCCGTGCGG 
flfiCGIGRGGC RCflCfiflGCTG GRRRRTTRCR RTGGRTTGGC 7GG7GCRR7T GRTRRGCTRC RTGCflflCCAfl 7G7G7RCTGG RRflGGARRTfl TTCRCGCTfifi 
CGCGCGCCTT TRCRTCRCCfl CRRflCGCITT T&RCTGTGGC CflG7R7CARC RGT7CTTTCG rCGTGTCRCT RRTCGTTRCT C7G7CR7GGR GTGGGGRGflr 
GRGflftCGGRT GCCTCflTCTR rciTCRRCCT RGRCflCIGGR CRRCRCCGR7 RGGCGGTfiflC RTCCPGT7RG TRC7ARRCCC RCRGRTCRTC RCCCRPGC7G 
GRGCCRTGflC CCCTCRCCTfl RftflTTGCRGR RICGGCRTGT fCIlCRflTTR GRGTCCGCR7 CCGflCRRCGC GCAC7BTRTT CTRTCIPRRG RTGGTRRCRG 
GAflTRRCTGG TRCRTTGCIR GRGGGTCRGR TRRCRRCRflT CRCTG'fiCCT 7CCRC7CCTR TGTRCATGCT RCGRCCTTflfl CRC7CRRGCR GGRCTRTGCfi 
GTRGTTflflCR RflCRCriCCR CGTRGGICRG GCCCTTGIGG CCftCTWICG IRRTRITCRR GGTRCTARG7 GGGCflGGTflfl R7GCtTCCAr GCTTRCCTRC 
GTGRCRGCTT CGTTGCGRRG TCCRRGCCG7 GGRCICRGGT G7GGTCIGG7 RC7CC7CGCG GTGGGGTRRC IGTGRCTCTT TCRCRGGRTC ICCCCTTCCC 
CRRTRTCTCG ATTRRGTGTG CCRRCRRCTC irCGRRCTTC 77CCGTRC7G GCCCCGRTGG RRTCTRCTTC RTRGCCTCTG R7GG7GGRTG G77RCGATTC 
CRRRTRCRCT CCRRCGGTCT CGGRTTCRRG RR7RT TGCRG RCRGTCG* TC RGTRCC7RR7 GCRRTCR7GG TGCRGRRCGR G7RRTTGGTR RflTCRCRRGG 
RRRGPCGrCT RG7CCRCGGR 7GGRC7C7CR RGGRGC7RCR AGC7&CTRTC R 7 ~ RCRC ITT flRCRRCGflflT TGRITRPGCC IGCTCCRRTT GTTCGGRCGG 
GICrRCCRGR rCTTRGTGCT CGRCTGTTCT nCGGTTRRG CCTTflRCGRR :&G7TC7RCG 77GC7GC7R7 CGCCTRCRCfl GTGGT TCRGR 7TGCTCCCRR 
GCTRGTCGRT RflGRTGflnC RCTGCRRGRfl RGCCflfiTRRG GRCT&RTRTG IflrOGflflflfiG GR7RRGRGCC 77RTTRCRT7 CTTRGRGRTG TTGGRCRCTG 
CGRTGGCTCft GCGTRTGCTT GCGGRCCTTT CGGRCCRTGfl GCGTCCCTCT CCGCRRCTCT flTRRTGCTRT TflflCflflfiCTG T7RGRCCCCC RCRRGTTCCR 
GRTTGGTPRC ITGCRGCCGC RTGrTCRCRI CTTRGGTGGC CT7GCTGG7G CTCTTGRRGR GTRCRfiflCRG RflflGTCGGTG RTRPCGGTCT 7flCGGR7Gm 
GflTflTTTRCR CRTTRCRGTG R7RTRCTCRR CGCCRCTRCR GRTRCTGGTC TTTflTCGflFG TCRTTGTCTR TRCGRCRTCC ICCTflCGTGfl ARTC7GRRRG 
TTRRCGGGRG GCRTTRTGCT RGRR77T7TR CGTRRGCTRfl TCCCTTGGGT TCTCCCTCGG RTGCTfiTTCG CGUflGCRTG CCflTCTfiGGG rCBGfiCTCRfl 
TGGRCGCTRfl RTGGflfiflCRG GRGGTRCRCR RTGRGTRCGT TflflGfiGRGTT GfiGCCIGCCfi RGAGCRCTCR RRGRGCflflTC GRTGCGCTRT CTGCTflflGTR 
TCRRGRRGRC CTTGCCGCGC rCCRfiCGGfiG CRCTGfiTRGG RTTRTTTCTG RTTTCCGTRG CGRCflflTRfiG CGGT 7GCGCG TCRGOGTCRR RRCTRCCGGR 
RCCTCCGRTG GTCRGTGTGG RTTCGRGCCT GRTGCTCGRG CCGRRCTTGR CGRCCGRGRT GCTRflftCGTR TTCTCGCRGT GRCCCRCRAG GGTGRCGCRT 
GGRTTCGTCC GTTRCRGGRT RCTfiTICGTG RflCTGCRRCG TflflGTRCGRR RTCflflGTfiflG GRCCCflflTGT G7CTRCTCRR TCCRRTCGTR RTGCGCTCGT 
RGTGGCGCRR CTGRRRGGRG RCTTCGTGGC GTTCCTRTTC GTCTTRTGGfl RGGCGCTRRfl CCTRCCGGTG CCCRCTORGT GTCRGRTTGR CRTGGCTflfiG 
GiGCTGGCGfl RTCGRGRCRR CRRCRflGTTC fl7C77RCRCC CTTrcCGTGG fRTCGGTRRG TCGTTCRTCR CRTGTGCGTT CGTTGTGTGG" TCCTTRTGGR 
GRGRCCCTCR GTTGRRGRTR CTTRTCGTRT CRGCCTCTRfl GGRGCGTGCR GRCGCTRRCT CCRTCTTTRT TRRGftRCRTC RTTCRCCTGC TGCCR7 TCCT 
RTCTCRCTTR RflCCCRflGRC CCGGRCRGCG TGRCTCGGTR R7CRGC TTTG RTGTRGGCCC RGCCRRTCCT GRLCRCTCTC CTRGTGTGRR RTCRG7RGC7 
RTCRC7GGTC RG7TRRC7GG TRGCCG7GC7 GRCRTTR7CR 77GCGGRTGR CG7 7GRGRT7 CCC7C7RRCR GCGCRRC7R7 GCGTGCCCC7 GRGflflCC7R7 
GCRC7CTGGT 7CRGGRG77C GC7CCC77RC 77RRRCCGCT GCC7TCCTC7 CGCG77R7CT RCCT7GGTRC RCCTCRGflCR WCBTOCTC 7C7R7RRGGR 
RCT7CRGGR7 RRCCGTCCG7 RCRCRRCCR7 IR7C7GGCCT GC7C7G7RCC CRRGGRCRCG TGRRGflCflRC C7C7RTTRC7 CRCRGCG7C7 TGCTCCTR7G 
T7RCGCGCTG RG7flCGfl7Gfl GRRCCCTGRG XRC7TGC7G GGRCTCCRRC RGRCCCRG7G CGCTTTGRCC GTGR7GRCC7 GCGCGRGCGT CPGT7GGRR7 
RCGGTRRGGC TGGCTTTRCG CTRCRG77CR TGC7TRRCCC 7RRCCT7RG7 GR7GCCGRGR RGTRCCCGC7 GRGGCTTCG7 GRCGC7RTCG TRGCCGCC7T 
RGRC77RGRG RRGGCCCCRR TGCR77QCCR G7GCC7TCCG RflCCdCPGR RCRTCR77CR GGRCCTT C C7 RRCG77CCCC TTRRGGGTGR 7GRCC7GCR7 
^TRCCRCG-RTTGTTCCRR CRRC7CRGG7 CRGIRCCRRC R&RRGRT IC 7 GGTCRT7GRC CC7RGTGGTC GCGG7RRGGR CGflflflCRGG7 TRCGCTG7GC 
TG7RCRCRC7 GflflCGG7IRC R7CTRCCTTR 7GGRRGC7GG R&G77TCCG7 GP7GCC7RCI CCGRTflflGRC CC77GRGTTR CTCGCTRftCR RGGCRRRGCR 
R7GGGGRG7C CRGRCGGTTG IC7RCGRGRG 7RRC7TCGG7 GRCGG7RIG7 TCXfRRGG7 R77CRGKC7 RTCCTTCT7R RRCRCCRCflfl C7G7GCCR7G 
^RGRGRTrc G7GCCCG7GG 7fl7GRflRGRG R7GCG7RTTT GCGRIRCCCT iGACtCRCTC RIGCRGRCTC RCCGCCTIG7 RR77CG7GRT GRGGTCRTTR 
GGGCCDRCTR CCRG7CC&C7 CCICRCC-TBC RCGGTRRGCR 7GRCC7:flRC :RCICG77GT TC7RCCRGRT GRCCCG7R7C RC7CG7GRGR RRCGCGCTC7 
OGC7CS7GRI CACCCR7TGG R7&CCC"CC GTTOGGCPTI OP C IRTCT C C GTWCICCflf GCRGT TGGRT 7CCG77flflGG TCGRGGG7&R RG7RC7TGC7 
39000 ORCTTCC77G RGGRRCRCRT GR7&CG T CI7 flCCGITOCTC CTRCGCRTRI CPrTGRGRIG r C 7G7GGGRG GRGrTGR7C7 G7RCTCTGRG GfiCGRTGRGG 
GTIRCGG7RC£7C7TTCR77 CRG7GGIGRf riRTCCRHR GGRC7GCR7R GGGRIGCRC 7 R7RGflCC3CG GR7GC7CRG7 7C77TRRG7T RC7G»¥WGR 
CRCGRTRfifll 7RRIRCGRC7 CRCIRTRGGG RGRGGRGGGR CGRRHGCHfl C7RTRTRGRT RC7GRftIGPfl 7RCT7RTRGA G7GCR7RRRG TRTCCflrRR7 
GC7GTRCCTR GRCIGRCCic TRRGRR7GC7 GR7 IRIHI TG T P7IQGTRIC ^CC7TflfiCti PflGGRCCRRC R7RRRGGGRG GRGRC 7CR7G UCCGCT7RT 
IGT7GRRCC7 RC7GCGGCRT RGftGICRCIF RCC&RTI IC7 IG;GGIRCr7 TC7GC7aCC nCGGIRCGC RIC 7C7TRCT CGRGRCCTCR G7 7CRCTGGR 
G7CIC7CG77 TGCrC7flIflC TCRC7TGIRC CGRF7RGGGT C77CCIGRCC GRC7GR7GCC 7CRCCGRCGG R77CRGCGG7 RTGRTTGCRT CRCRCCRC77 
CRrcCC7R7R GRG7CRRG7C CTRRGGIRIR CCCRTRfiflGR GCCTCIRRrc G7C7R7CCTR RCG7C TRTftC CTRFWGR7RG CCCRTCCTR7 CRG7GTCRCC 
IRRRGRGGG7 C7TfiGRGRGG GCCTR7GGRG T7CC7R7RGG G7CC77TRRR PIflrflCCR7fl RPRR7C7C«G rGR<ITA7CTC RCRGTGTRCG GRCCTRRRGT 
ICCCCCRTRG GCGGTRCC7R RfiGCCCflGCC flflrCRCC7flfl ftG7CRflCC77 CGC7TGRCC7 ICRCCGT7CC C7fl«CCCTTG CCGWGaCCC T7GCGTT7G7 
C7TTCGC7GT 7RCC7TGRG7 CTCTCTC7GT GTCCC7 1 . ^ 

Fi<;. 1(h) 

Fui. \. (a) to (h) Xucleotide sequence of the / strand of T7 DXA. The sequence reads o to 3' from 
left to right. The / strand is. by definition, the strand that has its 5' end at the genetic left end of T7 
: !t haa the ™ sequence as the T7 RXAs. The sequence hyphens have been omitted for clarity 
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Table 1 

Transcription and translation .signal* in the early region of Tl DXA 



RNA polymerase' RNAase IIP nj „ t « „ .. . _ x . t 

£. ro/i T7 Rtt<»c l>rw^;« < ol-<i ~. * n 



sites |>rotein< RK< + aa Nurl ,oiides Ti M 



AO 

Al 

A2 
A3 



4U'» n,| 



*/*/A 



R03 



Ml 



HI 



RJ1 



Rl-3 



I M7- <*25-l27<i 

^ 3 127*1431 



49s 

ti-'ti 157 

75<» l-SS 

S9o 2-23 



2 32-3 2<» 

3 20 3 5* 

3 75-4 lo 



fj 2 4 ~ + l4W-|fi3. „„. 

™i J & + H»M7ft5 4IO-4 49 

°' 7 2 359 + 2<r21-3<i9h 



TE 



4- H*-4-93 

5- <Ki-7*7t> 
3I3« rm 

883 + 3l71-oK2*» 7-&4-14 57 

5X4* U-tM 
5KK7 - H74 
5923 I4-K3 
['I [ 42 + 6*M*7-€I33 LVm-I5-3« 

15 37-lfHH 
04*19 1^15 
&44K 1H 15 

t>475-7552 16-21 -18-91 

75KK |chki 



12 2 H5- H137-6392 



M 1 359 + 



in ^XTm^T™ T ™ inati °? 8iteS " RX W 111 »nd 77 proteins are identified 

in the text. Maps of their relative positions are given in Fig* and 4 

* V T Il nU wT t i de * Ve " ^ a P romoter is th <* k ™»" or predated fir»t nucleotide of the R\A chain 
initiated I at that promoter and the nucleotide given for a inscription lerminat ion site .the ll t 

?2 RNA . maJ ° nty ° f at tha< ^ Th " ~npt ion ki^ £ , ^ t 

RNA ^erase are the 3 major early promoter* (AI. A2 and A3 , the mTnor leftward 
""V* ^ ^ (A0) * ^ ^ ^ n ^ion termination sit. at the end of the eart S 
^ T ° f ° th r m,n r^ rom ° terp f ° r * °* RNA I"!™™ are given in Tabt 6 ^ 

at J r'l T^Lfpv R ^ Aa8 f 111 deaVa * P * ^ kn — «' A ^ Nucleotide 

at the 3 end of the RNA chain produced by cleavage at the site <Vav a ^ ' al sjt ^ , m ™ , „ * 

relat.vely inefficient: this is indicated by parentheses in Table. ■> a „d 3 

' The gene numbers for the 5 potential overlapping genes an- given in , wuhe** in Table, 2 and 3 

^nucleotides given for each protein are the fir* nucleotide, of il initialion an d 

d The reading frame (RF) is determined by dividing the number of the finn nucleotide of th. 
Ration codon by 3 : if the remainder is 1/3. the reading frame U I : if 2 3. the r^Z £m is > i 

• The number is the total amino acids (aa, predict for the protein in.-luding the initiating 
me^ionmeJTheinituiting methionine is known to t* removed from a, l^t som. of the n" 
proteins predicted to retain the initiating methionine are indicated bv • . an d th.l p„^ ( ^ 
the initiating methionine are indicated by - (see Dunn & Studier. 19*il , 
A T7 unit is 399 36 nucleotides or nucleotide pair*. 
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Table 2 
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Transcripts and translation ,ignals in the class // region of T7 DNA 



KXA polymerase R\\W Iff 



sites 



H3 
<H-7 



(R3X) 



R*7 



Protein 


RF 


14 


3 


15 


3 


16 


! 


17 


3 


IX 


1 


•> 


3 


2-5 


2 


2$ 


2 


3 


3 




2 



51 + 

29 + 

86 + 
196- 
+8 + 
64- 

232- 
139 + 
149- 
151- 



3-8 


2 


121 + 


4A 


3 


566 + 


(41) 


1 


40- 


4B 


3 


503 + 


(4-2) 


I 


112 + 


43 


2 


70 + 


4-5 


3 


89- 



47 


i 


135 + 


5 


t 


704 + 


■5-3 


i 


118 + 


-5-5 


3 


99- 


J* 7 


2 


69- 




1 


348- 


6*--; 


3 


37 + 



Position in T7 DNA 
Nucleotides 77 unjts 



7608-7761 
7778 

7791-7878 
7895 

7906-8164 
8166-8754 
8749-8893 
8898-9090 
9107 

9158-9854 
9857-10.274 

10.257-10.704 

10.706-11.159 

II. 180 

U.203 

H.225-11.588 
11565-13.263 
1 1.635-1 1.755 
U754- 13,263 
12.671 

12.988-13,324 
13.341 

13.352- 13.562 
13.584-13.851 
13.892 
13.915 

13.927-14.332 

14.353- 16.465 
16.483-16.837 
16.851-17.148 
17.150-17.357 
17.359-18.403 
18-393-18.504 



19-05-19-43 
19-48 

1951-19 73 
19-77 

19- 80-20-44 

20- 45-21-92 

21- 91-22 27 

22- 28-22-76 
22-80 

22-93-24^7 

24- 68-2573 

25- 68-26-80 

26- 81-27-94 

27- 99 

28- 05 

28-11-29-02 
28-96-33-21 
2913-29-43 
2943-33-21 
31 73 

3252-33 36 
33 41 

33 43-33-96 
34-01-34-68 
34-79 
34-84 

34- 87-35-89 

35- 94-41-23 
41 27-42- 16 

42- 20-42-94 
4294-43 46 

43- 47-46-08 
46-06-46-33 



restrit tion endonueleases are at variant u-ith *u 

results,, whereas w have S v t (""Polished 

patterns and those predicted bv the .sequence of tl iT™ ° bsened friction 

Portions of the nucleotide «». . que f nce °' rl g 1 f ™ any enzvme tested 

rr e.forT7 RXA S^S^^ 1*^1? \} h < «*« a round 

F 'f , ' 'L"V C ° mp,et<i a ^ ment with the sequent art^d ih ^T" ^ * iven in 

pubhshed by Oakiev el al. (1979) as eornXdT. p ,\ t' 3 P rom ° t *«- (position 68 3) 
around the H -3 and ^ l } 9 ? lb J- ™* the sequences 

Mc Allister (.981,. Thenfare diZpJSZZ^ ^ Vein * & 
sequent published by Rosa (1979.l98lTand RoL * ^»" ence g' ven in Fig. 1 and the 
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R^A polymerase RXAase III 

1 ' sites p rotein 



R65 



4>13 



(R13) 



R/£-5 



4>OR 



66 
67 
7 

73 

7'7 



10A 
JOB 

II 
12 



13 
14 
16 
16 

17 

176 

18 

1S5 
(1*7) 
19 

(192) 
(193) 

196 



RF 



fMet 
+ aa 



84 + 
88- 
133- 
99- 
130 + 
536- 

307- 

345- 
398- 

196 + 
794- 



138 + 
196- 
747- 
1318 + 

553- 
67 + 
89 + 

143 + 
83- 
586- 
85- 
57- 



Position in T7 DXA 
Nucleotides 77 uniu 



49 + 



the legend and footnotes to Table I 



18,544 
18.562 

18.604-18.856 

18.863-19.127 

191 29-19.528 

19.534-19.831 

19.847-20.237 

20.239-21 .847 

21.864 

21.949-22.870 
22.903 

22.966-24.001 
22.966-24.159 
24.209 

24.227-24.815 
24,841-27.223 
27.273 
27.280 

27.306-27.720 
27.727-28.315 
28,324-30.565 
30-594-34.548 
34,565 

34.623-36.282 
36.343-36.544 
36.552-36,819 
36.855 

36,9)6-37.345 
37.031-37.280 
37.369-39.127 
38,015-38^70 
38.552-38.723 
39.228 

39.388-39.535 



46 43 
46-48 

46- 58-47-22 

47- 23-4789 

47- 90-48-90 

48- 91-49-66 

49- 70-50-67 

50- 68-54 71 
54-75 

54-96-57-27 
57 35 

57-51-60 10 
57-51-60 49 
60-62 

6066-6214 
6220-68-17 
68-29 
68-31 

68- 37-6941 

69- 43-70-90 
70*2-7653 
7661-86-51 
86*55 

86-70-90-85 
91 -00-91 -51 

91- 53-92-20 

92- 29 

92- 44-93-51 
92 73-93 35 

93- 57-97-97 
95*19-95*83 
96-53-96*96 
9823 

98*63-99-00 



'^Z^TT^ & ^ 1—- - observe, 

"at ml f fnigment6 »- "™<in%2 frtUeS //iil ST™** 1 HaeUI < 69 fragments,, 
tJL ,2 D fra e ment8 >- Thai (66 L^en'J^t^V"^^ 1 * CiI (, ° ^ents 
thesda IW rch Laboratories, Inc. orTrZL^S^ £TT ^ ° bt * ined f ™ 
i he above dteests were aim K^ k i . England BioLabs. 

^y^^ ^^ or a gradient of 3 o /o 

osenberg d at., 1979). Under these conditio i \ 20mM - 8oet,c 2 mM-Na 3 EDTA 
s generally a smo oth function 7 ZS^J^J"™* of the ™* fragmente 
re analyzed by electrophoresis in j^SZ^t when *e same digest* 

• fragment* had a relative mobiliVv Xed? 0f t T T0Tm °° n ^^ti 0 n. many of 

*t,ve «e. Apparently, factors otheV ^JfhT ^ ^ pr6dicted { ™ their 

nan size can have a significant influence on relative 



490 •' J Ol'XX AND F. W. STTDIER 

mobility of DXA fragments in uniform polvacrvlamide eels at U= . ■ .u- 

whereas f ize of the fragment is the dominant *£Li„^*^ th " buffer ^stem. 

polyacrylamide gels of appropriate composition. Wrmmant " f nHa,,ve mobility m gradient 

3. T7 Genes 

(a) Identification 

Genetic and biochemical analyses of T7 hav» - 1 i 
existence of at «t A, provided evidence for the 

^^^^zi^ ^v.rr- 1969 , 972 , 

communication: unpublished results,. All 38 of*£ l^n ^T* 
located in the nucleotide sequence of T7 DXA in at Lt 2 of t. J* 
deletion mapping (Studier et «/.. 1979; Studier 1981 , bv " ^ Wa - vs ; ^ 
mutants with cloned fragments of wild-tyM T7 D\A Js'tud" Tp T" ° f T? 
unpublished results), and by direct deter^ina^ntf 1 f ^j* 111 *^ 1981 : 
the site of the mutation (Dul e, ^ d SS^^/^"tST^= * 

The codmg sequences for the 38 known genes do not o« JeSv ST th 
nucleot.de sequence of T7 DMA. However, the gaps be"wlnTnn , 
occupied by potential genes, each of which hasl ^Z* k. ? W 
binding and initiation 1 for protein fl^iZ^,?;^^ 

(Tabled ,°to7I„H F S ! Ud,er - I98,) ' nd in the "fining DNA 

an arrangement that strongly suggests all 50 ofC ^Zt^ZT^ 

Early | ONA metabolism | Virion structure ond assemb.y 

J 5 7 * H 13 15 17 19 



e. con 



RNA polymerase Al.2,3 T£ 



"NA polymerase <, 0L ^SijUlfc,^., ' U > I I 



RNAase m ^|| | |(^ i i 



cleavages HO-3.0-5 R/ R/-/^J (RJ-*) R<r 7 R6 . 5 J » 

0 , ",0 ,,, 20 ""3 , 0 "" 4 , 0 I " 50 " 60 ^ ' "^- ■ "^■'■^ 

---"^^ «- ■«* 

represented by the open boxes, the terminal repetition^ ^a" er aiw flT""^ J 7 
genes, and those *ith integral numbers are indicated, as areVe ^Uoni^f 7 *"* ^ 
term.nat.on sites and RNAase II! c.eavage sites. The sc^I^;^ ^ transection 
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infection. The proteins not yet identified are small (29 to 143 amino acids), so it is 
not surprising that they have escaped detection. 

Genes whose coding sequences overlap in different reading frames have been 
found in bacteriophage <£X174 (Barrel] et aL. 1976: Sanger et al. 1978) and might 
also be expected to occur in T7. We have searched for potential genes whose 
coding sequences would overlap one or more of the 50 close-packed genes (see 
Synthesis of T7 Proteins, section 7(b)). Five potential overlapping genes have been 
identified, which would specify proteins of 40 to 1 12 amino acids, and preliminary 
genetic evidence (unpublished results) suggests that one of them is expressed. For 
convenience, we refer to genes in the above set of 50 as "close-packed", and genes 
whose coding sequences overlap one of the close-packed genes in a different 
reading frame as "overlapping *. 

T7 genes are numbered in order from left to right, according to their position on 
the genetic map: the 19 genes on the original genetic map have integral numbers 
(Studier. 19G9). and genes added subsequently have decimal numbers. Potential 
overlapping genes are given numbers according to the relative positions of their 
left ends. The first gene at the left end of T7 DXA is gene 03 and the last gene at 
the right end is gene 19*. (A gene 20 was proposed by Pao * Spever (1975). but 
our unpublished results on a mutant kindly provided by Dr Spever indicate that 
the proposed gene 20 mutation actually lies in gene 5v.) 

The 55 known and potential T7 genes we have identified in the nucleotide 
sequence are listed in order in Table 4. together with the predicted sizes and. 
where known, functions of the proteins specified. An example of the pattern of 
protein synthesis during T7 infection is given in Figure 3. T7 genes are expressed 
co-ordinately in three groups (Studier. 1972): the early, or class I genes are 
transcribed by Escherichia coli RNA polymerase, and include functions to 
overcome host restriction and to convert the metabolism of the host cell to the 
production of T7 proteins: the class II genes are the next to be expressed, and 
include functions involved in DXA metabolism; the class III genes are the last to 
be expressed, and include genes for proteins of the phage particle and functions 
involved in maturation and packaging of the DXA. The boundary between class I 
and II genes is the transcription termination site for E. coli RXA polvmerase 
between genes 13 and 14 : the boundary between class II and class III "genes is 
the <j£-5 promoter for T7 RXA polymerase, located between genes 6 3 and 65. 

The sizes predicted for the T7 proteins are generally in good agreement with the 
sizes estimated from gel electrophoresis in the presence of sodium dodecvl sulfate 
(Studier, 1972J9736J981 i Studier el al.. 1979: see Fig. 3). Most T7 genes appear 
to specify a single protein, but at least four genes are known or thought to specify 
two distinct proteins, referred to by the gene number followed by A or B: These 
double proteins are discussed in section 7. Synthesis of T7 Proteins. 

We proposed a gene 0-65. based on the presence of a substantial open reading 
frame but a somewhat unusual ribosome-binding and protein initiation site (Dunn 
& Studier, 1981). With the discovery that genes 55 and 10 apparently produce 
double proteins by frameshifting during translation (see Synthesis of T7 Proteins 
section 7(c)). it seems jK>ssible that the open reading frame previously assigned to 
gene 0 65 may in fact be translated primarily by frameshifting during synthesis of 
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Table 4 

Known and potential proteins specified by T7 DNA 



Class [ 



Class II 



Class HI 



Gene 



(h3 

04 

0-5 

0-6A 

06B 

0-7 

I 

11 
12 
13 



14 
15 
/tf 
. *7 
18 
2 

2-5 

2- 8 
3 

3- 5 
38 
4A 
4B 
(41) 
(4-2) 
43 
45 
47 
5 

53 
55 
5-7 
6 

63 

65 
67 
7 

7-3 
7-7 
8 
9 

10A 

10B 

11 

12 

13 

14 

15 

16 

17 

175 

18 



Amino b 
acids 



116 

50 
47 
53 
111 
359 
883 
42 
to 
359 

51 
29 
86 
195 
48 
63 
231 
139 
148 
150 
121 
566 
503 
39 
112 
70 
88 
135 
704 
118 
98 
68 
347 
37 

84 

87 
132 

98 
130 
535 
306 
344 
397 
196 
793 
138 
195 
746 
1318 
552 
67 



13.678 
5621 
4744 
6201 
(13.250) 
41.124 
98.092 
5180 
10.059 
41.133 

5446 
3174 
9946 
22.053 
5781 
7043 
25.562 
15,617 
17.040 
16.806 
14,329 
62.656 
55,743 
4265 
12,653 
7927 
9960 
15.208 
79,692 
13.067 
11.075 
7280 
39.995 
4088 

9474 
9207 
15.303 
9937 
14.737 
58.989 
33.766 
36.414 
(41.800) 
22.289 
89.265 
15.852 
20.836 
&t.210 
143.840 
61.441 
7391 
10.145 



Function 



Inactivates host restriction 



Protein kinase 

T7 RXA polymerase 

Replication 
DNA ligase 



Inactivates host RXA polymerase 
Single-stranded DNA-binding protein 

Endonuclease 
Amidase (lysozyme) 

Primase 
Primase 



DNA polymerase 

Permits growth on X lysogens 
E x on uc lease 



Host range 
Host range 

Head-tail protein 
Head assembly protein 
Major head protein 
Minor head protein 
Tail protein 
Tail protein 
Internal virion protein 
Internal virion protein 
Internal virion protein 
Internal virion protein 
Tail fiber protein 

DNA maturation 
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Table 4 (continued) 



Amino b 
^ne ' acids 



- v r c Function" 



m 143 ih^43 

82 9)95 

(/!■*! *S 66 ,3 ° DXA matur *tion 
(/f?) 56 
19* 49 



W29 
5434 



* Se n^icn'l't 5 '?" nti f l 1 ' ow «-,,h„ p genes given in .^ntheses. 
gene LrfM £tt TOrreSPO " dS * * 

the gene M protein. This would be consistent with all of the information we have, 

dlr^JT? nZ° f ^ Pr ° teinS PTOdUCed from this "W« — a,ns to be 
determined. Gene 0-65 is no longer listed as a separate gene. 

(b) Mutations 

The locations in the nucleotide sequence of point mutations in several different 

Stud,er, 1981). Several additional point mutations have been located in the 
remaining sequence. The locations of the mutations and the predicted effect* on 

ntlSr ^ gIVe " I! T& r\° f ° r a " ° f the mUtati0ns 80 far identifi «* «» the 
??,T W ' A " ° f the amber anions are in the reading frame 

* V k! ^ amber mntati ° ns in 6 enes 5 * nd « w«* chosen 

for analysis because they were known to lie near the OOOH-terminal end of these 
relatively long proteins and would demonstrate that the predicted reading frame 
is correct'- 

The two known mutations in gene 5-5 (Studier, 1981) have both been located in 
£e sequence. As expected, the B64 mutation is an amber mutation,- which defines 

«** ^""TZ '^".r^ a,S ° <! ° ntainS & mi88en8e mutetion in 
gene 5 J. The B31 mutat.on, wh.ch causes a marked decrease in mobility of the 

fltadfe 1ST T 80diUm sulfate /polyacryla m ide gel electrophoresis 

I i' r 668 an &rginine reSidue 10 Bl-^nina. the onlv change 
STtSJ?. w ?• f T 5 5 PTOtein A PP aren %. * single amino acid change 
can produce a substantial change in mobility in sodium dodecyl sulfate/polyacryl- 

Itti fllt rT* h °r\ at ,CaSt f ° r a Protein ° f this size " (This 

a^trT™ w al8 °, haS " miSSenSe mutation in the «« M protein and 
another conceivable explanation for the change in mobility is that the gene 5 3 
protein normally processes the gene 5-5 protein.) 
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Table 5 

Locations of T7 point mutations in the nucleotide sequence 
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Mutation 



Nucleotide 



Change 



0-3-CR35b 
0.3-CRI7 

0*3-CR3b 
0-3-ORlob 



2-64 

2- 139 

3- 285 
5-29 

«?5-Lvsl3a 
*-2<l5 
o-l»8 
55-B64b 

55-B31a 

6"- 147 
7-36 

7-213 
7-3-40 



Amino acid change 



7-5-405 



914 
926 
926 
997 
1 189 
1168 
1175 
12U8 
8947 
9««l 
lit. 332 
1<».5(»6 

lo.sos 

13.186 

16.351 

16.681 

16.953 

16.541 

17.137 

lN.32* 

19.181 

19.478 
19.538 

19.666 



G 

atg 

ATG 
TAT 
CAA 

era 
C(x; 

TGG 

tgg 

TGG 
VAC 
GAG 
(AC 

tgg 

(AC 
GTT 
CAG 
A(V 



cgg 
cag 
tgg 



to A 
to ACG 

to acg 

to GAT 
to TAA 
to TTG 

to gtg 

to TAG 
to TAG 
to TA(; 
to TAG 
to TAG 

to tag 

to TAG 

to tag 
to ATT 
to TAG 
to ATC 

to cag 

to TAG 
to TAG' 



T(T to TTT 

OGT to GAT 

CAG to TAG 



fMet to Thr 

fMet to Thr 

Tyr24 to Asp 

Gln88 to ochre 

LeuSl unchanged 

Ala83 to Val 

Trp94 to am()er 

Trp)6 to aml>er 

Trp+4 to am(>er 

Gln25 u> am her 

Gln83 to amber 

Gln34 to amber 

Trj>541 (478 in 4B) to amber 

Gln667 to aml>er 

Val67 of 5-3 to He 

Gln34 to amber 

Thr2<» of o-3 to He 

Arg95 to Gin 

Gln323 to aml>er 

TrpI7 to amber 

creates new Acc\ site 
8erl 16 to Phe 
Glyl to Asp. with 

retention of initiating Met 
Gln44 to amber 



-K.^£^^^^^i^«V^^ («*,«- Dunn 
were reported bv Dunne/ „ 9 7 8 , LTdZ *i fc^EE™ *° ° f ""^ ,2 - ,0 ° 

(St^TS^ "TT ° rigina,,y aSSigned to a ™& 8«. gene 7 
inH rT TK ^ ° Und 40 1,6 m tw ° ^es. now designated Jnes 7 

Ti The mutants originally assigned to gene 7 were the onlv SeTSS Jt 

not possible. Therefore, five mutations that were found to lie between eenes 6 and 

^'7^ !T Ta«o mg,e g6ne Simp,V »»»■« thev '«v *» the «^ST^ 
map (Studier, 1969). It was subsequently observed ' however that S,e .11 7 
mutant* did not a,l have the same p.atin/behavior on all ^ (Studied SZy 

"^m**^ A culture of E. «* C gro „ing in 

partic.es/ce..: S0- M . sanies J^^ttm^^^S^^^. ' 5 ^ ^ 
and at 2 min intervals after infection the Zn. „ J, ' S J Inetn,on "« , immediately before 

containing sodium dc^ec^.Tate a "d 1^ "^ 

containing Tns/glvcine/ch.oride disc.ntinuouf^ S^T|£'^ ^ ?"*■?- 
eradient gel having a 5% stackine eel followed l,v . . j- , - o to 20% polyacrylamide 
Studier (19736). The oririn of eleTt™ n SLi .... ■ mul f™ A «V™t*>?- essentially as described l,v 
each 2 min pu.se is givef aoove^h S"^ ^ ***** ^™ Th < *•* « the beginning of 
infection under thei cond^o,L ThV^e nl^f wo »" ■»"»">• begin about 25 min after 

»f the patterns, as identified b^ sTud^S^) Pr0 " ,,neBl ^ Pr ° teinS "* 10 the *»' 
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~ 7 ■ n . five r gmal « ene 7 mutan * has been sequenced in the 
gene 7 region. Only one change was detected in each mutant: two of the 
mutations affect gene 7 and two affect gene 7-3: one amber and one missens! 
mutation was located in each gene. Mutations in gene 7 and 73 appear to affect 
the host range of T7 (Studier ; 1975«). and the apparent am oT pheno l 
ongmally observed may reflect a host range property rather than the pr^Tf 
the amber suppressor. K 

id.nt r fi !li e 50 ? 0 V Bd ? f neS ' thC Predicted pr ° teins that have vet been 
.dent,fied geneucally or biochemically are those specified by genes 1-4. 15, M . 

38, o-3, 6-3 6-5, 6-7, ,7 W - a and 79-5. Deletion mutants that affect the coding 

sequences of seven of these genes have been isolated (Studier et al.. 1979- and 

unpubhshed results); the only close-packed genes not yet known to be affected bv 

any available mutat on are 63, 6-5, 67. 185 and 195. Proteins whose coding 

natZTT , ^ '° 0ked f ° r by C ° m P arin « the electrophoretic 

patterns of proteins produced from wild-type or mutant DXA, and preliminary 
results have tentatively identified two of the predicted proteins. It might also Z 
possible te identify the predicted gene products by use of antibodies raised against 
synthetic peptides that have amino acid sequences predicted to be in individual 
proteins, as descnbed by Walter el al. (1980) and Sutcliffe et al. (1980). 

(c) Packing 

The coding sequences of the 50 close-packed T7 genes occupy 919^ of the 
nucleot.de sequence (Tables 1 to 3 and Fig. 2). and where anv sizable non-coding 
sequences occur, recognizable genetic signals are almost always found The longest 

Ho" ol 3 ^ 68 , 0 ^ at thC tW ° Cnds ° f the DNA; if these ™ excluded, 
95-0% of the internal DNA is coding sequence. 

Taking the coding sequence for a protein to extend from the first nucleotide of 
the initiation codon to the last nucleotide of the termination codon. there are I* 
cases among the close-packed genes where the coding sequences of adjacent genel 
overlap. Seven of these are one base-pair overlaps of the termination codon for 
one protein w.th the initiation codon of the next. The other overlaps are ->6 base- 
pairs (between gene 3* and the first of the two protein start sites in gene 4) 20 
base-pairs (genes 28 and 3). 14 base-pairs (genes 6 and 6-3). eight base-pairs 
(genes 1 -7 and 1-8), and four base-pairs (genes 0-5 and 0 6). The ribosome-bindine 
and prote.n mit.ation sites for 19 of the close-packed T7 proteins lie partly or 
wholly within the coding sequence for the preceding protein " 

Where adjacent coding sequences do not overlap (37 cases), the gap between 
codmg . sequences ranges from 0 to 258 base-pairs (average 52). Nineteen of these 
gaps. 25 to 258 base-pairs long (average 87). contain promoters for T7 RXA 
polymerase, transcription termination sites. RXAase III cleavage sites and 
origins of rephcation; and six of these 19 gaps contain both a promoter for T7 
RNA Hymerase and an RXAase III cleavage site (see Fig. 6). These signals are 
discussed in the following sections. Seventeen of the gaps, 1 to 58 base-pairs long 
(average 16), contain none of the above genetic signals, nor anv others we could 
recognize, except for ribosome-binding and protein initiation sites 
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result of evolutions- pL. bi „. of .J " *°„ Thle effl <"«>»y » piwum.bly the 
• DKA who* hl^^.t^S^Sr ""-a- **> 

4. Transcription of T7 DNA 
Work from a number of laboratories has nmH»,^ « 

1981: McAllister et al. 1981) T7 n\i ; c »„„ u 7 nn * Studier. 

first bv £. coli RX.A polvmerasV whir h ^ T^J from ,eft * 

newlv-made TT R^^A^, ^ ^ t T^^ ^ ^ and bv 
and the entire Son S of ^"^"^^arlv^ 

termination sites for boTh'poK-.n ha vel^T* < Md ^^on 
sequence: their positions ^ ^^^^^^ 
tramanpt.on pattern is represented in Figure 4 " d th<? total 

(a) /or E. coii polymer 

Three major earlv promoters for f /•/»/. pyi , i 
the non-coding region nearThe e ft end ^tTd\T^ A \" * 
Minklev & Pribnow 197<*> in JaI A (Dunn & Studier - 1973 

leftward promoter AO (also c^ed the TT F"*** ^ 

in option sties ^ Z^^^^^^ 



Al.2.3 



i 



T <t> 



Class I 



Ckrssn 

Class in 
Readthrough 



1 "I I i I 
■3. 0'5 i l-tJ-3 (R3*<9)R<-/ ft6-5 




CR/J) 



J ' 1 1 u 



° '° 20 30 «° 50 60 
W ^I^^U.'^^rS ^ ^ *~ « - open 

RNAaae III cleavap- ,iux. a* indicate bv lwivnthe^L H Va" *S cut »« «•* *** »nd Rl3 

repented. The scale bel„» j, in m un ^ KN pr0duced bv "^tWu^ of T« .r. a lso 
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•nd for min0P A0 ,„ d c ^ 'he thj« m . Jor promMM , 

.975 , m: m^,Z?Z^TZJ™ZTJZ 

location and start site of the minor R nm™* u " etnev - I980 >' a nd the 
promote™ diseased below, .„ gathered in " lth lhe «dd,t,o„al m ,„„ r 

promoters, and the last site can be assigned t« th» v V A ° and A3 

Stahl & Chamber.in. 1977, ExamiSn of thJ FZT * ^ 1973 ; 

cleavage site at position 92 Z <tZ T nucleotide sequence around the 

to block cleavage. No potentia, 4htl^^^ « 
sites at 46-59 and 6316. but potential le«Wl ^ . dneart he cleavage 
(Table 6). potential leftward promoters were identified 

Experiments to clone fragments of T7 DNT4 int« ~i„ j 
that several different minorVomotel incfudingTe bTh £ 
utilized in plasmids (Studier & Rosente" ,981 !„H Ti^' M be 
Promoter activity is inferred because foments of T7 DNA in'",, H ^ T^" 
predicted promoters is directed toward a T7 In. A ? Wh ' Ch ° nC ° f theae 

site of plasmid P BR3 22 . ^^S^JZ^^t^ ^ 
promoter can be cloned in the silent buTno th„ ? ge ™ but n0t the 

the tetracycline promoter t^T^^' ^ 

transcribe the T7 gene in the plasm id .mTi. i '« ° r P romoter ca " 

p.asm.d-containim/ce,,. Exa^^he » *• 

where such promoters would be ex Dec ted to li* h» a m the re g ,orls 

near positions ^ 

-n^^ ^ Table 6. 

^ebenlist * al. (,980,. generally havl a go^ ma^oTh T T G K 
the -35 region, to the A residue nine nucleotides ahead of thi» T's f q ^ m 
2 and 6 of the T- A T- A A T in *h<> cieouaes a " ead of this, and to positions 1. 

positions 3 to 5 of the T-A T A ~ .T^T k ^ ,CSS homolo gy *> 
analvzed bv Sv^VJaw'^ T ^ » r ° moU>r * 

and A3 promote, to the t^L %£? m fh^f 

the minor promoters, sueeestine that „r * " t . in ^ v ****** "wwi that of 

^ md^ „ r i*K.?rtas : h .TdL p ' rticu, "' y 
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and A2 have five A residues in a row and 41 h» r 
in this region Several of th» m ; ^ f ° Ur a row and *^«n of eight 

pair, in this re^on L t no J Z ™ Com P a ^ly rich in AT bat 

(RNA, strandTat' the ZL ^^"T**? ° f A residues » *• « 
P-note ra , perhaps *n \?Z >ZT^Z\^t f A2 ^ A3 
strings of A residues in the -45 reeion Th* " S,eben,,8t ** < l 9«» have 
DNAs or direct the transcription o7f Z 7 pT ° moU:TS are 8«™«Hy in phage 
or « to he -n^^^/T " « ^n 

extends to the -45 rpmnn nf *^ V' A Po'ymeraae apparentlv 

concentration o f * Z^IXZZ^I^* * ' 98< "- ^ • 
strength. strand ln th,s re gion increases promoter 

(b) Promoters for T7 RNA polymerase 

Oakley * a/., 1979- Rosa 1979 % ^ ( ° ak,e - v * Co,en >*n, 1977 
Hayward, 1979; Dunnl' 8^198 ^7 T*? Z ^ 19?9; B °° thro y d & 
Co mputer search of "*>■ 
finds 17 such promoters, all oriented for !™ ^ ^ both directio ns 
Individual promote, are ^^J^^^^^ <™* ?) " 
first transcribed from the promoter excent fTJn I S?*> ° f the & ne 

to be parte of replication origins T^tar th^tftanH^ ^ th<>Ught 

Dunn & Studier, 1981 ; StudVr & RoTnC 1981 ) " 8 r ° f ^ DNA 
m vivo and » vitro have shown that aH 17 5th ' ° f Pr ° m<>ter 

T7 RNA polymerase, and have n^Id „„ ^ Pr ° m0terS 04,1 1,6 utUized 
promoters (Golomb & ttambTri J ^ IT C ° nV,nCm « evidence for any other 
198, ; McAllister^ c^ i98i' : ^ : l7rcr ma ?T 9 ^ ,m Th I979 th Cart f r 

The promoters for T7 RNA polymerase in T7 nvi u 
three groups, referred to as claTlI c ^ rn Vr v * ^ mU > 
their location, utilization and nuc eoS 1<L„ "t^™ P romoter *- based on 
S tud^l98, Studier, Ro^b^:!^^ ^ '"^ ^ & 
— g enes. and 

1974: Xiles & Condit. 1975 • Mo Anfeter & vZ* V™ (G °'° mb * C hamberlin. 
Kassavetis & Chamberlin im Zt^rTr ^ ' McAI, " ter & VVu " ,978: 
1981). They a„ have exactlv ttm i t l»i^T 

*PPea- " — They 
class III promoter sequence mlZ two^to ^ ^ ^ th * 

promoters, *7/A, tf.J B and AT J * ~™ P 0 "'"™- Three'of the class II 

they direct transcript on t^l cC ^1" (Ch " " but 

*. II premoters Itinues ^^^'"J 
Promote, are part of the primary origin of 3E£3^ D £ 
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Table 7 

Promoters for T7 RNA polymerase 



Promoter 



RXA start site 
Nucleotide T7 unite 



Conserv ed -sequence 



Nucleotide 



Replication 


promoter 




4QL 


405 


H»l 


Hass 11 promoters 




41 1A 


5K48 


14-64 


411B 


5923 


14-83 


413 


6409 


16-05 


41* 


i #78 


Ifr4K 


4M 


7895 


19-77 


426 


9107 


22-80 


43S 


U.180 


27-99 


4* 


12j67I 


3173 


443 


13341 


3341 


447 


13.915 


34 84 


(lass III promoter* 




46$ 


I8J>44 


46-43 


49 


21.854 


5475 


410 


22.903 


57-35 


413 


27.273 


68-29 


41? 


34-565 


86-55 


Replication promoter 




4QR 


39.228 


98-23 



-10 



-20 



TA A T A CG A CTC A CTAT£GGAGA 
-10 i 



-20 
AAf'GCCAAAT 
TTCTTCCGGT 
GGACTGGAAG 
AGTTAACTGG 
TGGTCAC'GCT 
AGCACCGAAG 
CGTGGATAAT 
CCGACTGAGA 
AGTCCCATTC 
TTCATGAATA 



-10 | 

CAATACGACTCACTATAGAGGGA CA 
TA ATACG A ( TO A CTAT AGG AGG A CO 
TA AT A CG ACTCAG TATAGGG A CA AT 
TAATA(^GACTCACTAAAGGAGGT AC 
TAATAOGAOTCACTAAAGGAGAO AC 
TAATACGACTCACTATT AGGGAA GA 
TAATTGAACTCACTAAAGGGAGA CC 
CAATCCGACTCACTAAAGAGAGA GA 
TAATACGACTCACTAAAGGAGAC AC 
CTA TTOGACTC A CTATAGGAG AT AT 
-20 -10 i 

GTCCCTAAAT T A ATA CG ACTC A CTA TAGGG AG A TA 
GCCGGGAATT TA ATA CG ACTCA CTATAGGGAGA CC 
ACTTCGAAAT TAATACGA CTC A CTATAGGGAGA CC 
GGCTCGAAAT TAATACGACTCACTATAGGGAGA AC 
GCGTAGGAAA TAATACGACTCACTATAGGGAGA GG 

-20 -10 J 

OACGATAAAT TAATACGACTCACTATAGGGAGA GG 



The 1 ** ^ — ~ ^ 

^uCrS"!*^?^ 1 ? e nUC,e ° tide ^ ue "^ a re from the / 
The 23-b«e con^ri sequence fc^offZT^ . e <* ulva,ent *" ««* promoter 

the top of the Table nucleotide. tha^^h^T " 8tenSk8 ^ COnserved ""J"*™* « 

the letter, tho* tha, ^ f ound ^ 6 of^Tn^ " pr ° m0ten ' a * * » line above 

nucleotide of, he RXA chain is underlie l£. pron, °J ere L are indicated by a dot. and the first 

■* unoerlmed. The sequence hyphens have been omitted for claritv 
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1980: and our published results), and could therefore be considered to be 
replication promoters as well as class II promoters. 

The 4>QR promoter is considered to be a replication promoter because it lies 
within a fairly long non-coding region (258 base-pairs) and is part of a replication 
origin (unpublished results). However, it might also be considered a class III 
promoter: it directs transcription of one small gene, gene 195: it completely 
matches the conserved class III promoter sequence: and it appears to be utilized 
as efficiently as the class III promoters in vitro (Golomb & Chamberlin. 1974). The 
<f>OL promoter, also thought to be part of a replication origin (Dunn & Studier. 
1981 ), differs from the conserved class III promoter sequence in two positions 
(Table 7). Although the <f>OL promoter can be utilized in a restriction fragment of 
T7 DXA (Osterman & Coleman. 1981), and may indeed direct transcription of the 
entire early region in vitro (Scherzinger et aL 1972), there is no evidence that <f>OL 
directs transcription of the early genes of T7 in vivo (McAllister & YVu. 1978: 
McAllister et al.. 1981 : our unpublished results). 

Uninterrupted homology between pairs of promoters can continue past the 
conserved sequence of 23 base-pairs: the <f>10 and </>13 promoters are identical for 
30 consecutive base-pairs ( -24 to +6) : the tf-o and jQR promoters match each 
other for 28 continuous base-pairs (-22 to + 6) and match the 410 and <f>13 
promoters for 27 consecutive base-pairs (-21 to +6): the $9 and <f>10 promoters 
have a continuous match of 26 base-pairs (-18 to +8): and the <f>!7 and 4OR 
promoters have a match of 25 base-pairs (- 17 to +8). The longest perfect mateh 
involving class II promoters is between the 416 and 443 promoters. 25 
continuous base-pairs from -17 to 4- 8. 

The 11 promoters that differ from the conserved class III promoter sequence do 
so mostly in positions +3 to +6: only 50% of these bases match the class III 
sequence, whereas over 90% of the bases in positions - 17 to 4-2 match the class 
III sequence. What appears to be conserved in the region past the start site is the 
presence of purines in the / strand: 92% of the bases in positions — I to +6 of 
these 1 1 promoters are A or G. and all 17 promoters have an uninterrupted string 
of 5 to 12 purines that includes the start site for the RXA. Considering all 17 
promoters, nine positions are completely conserved and four more are the same in 
all but one promoter (Table 7). These 13 most highly conserved positions lie 
between - 16 and -1. suggesting that the precise sequence of this region, 
together with a polypurine tract in the - 1 to +6 region, are important factors in 
defining a promoter. 

Panayotatos & Wells (1979). Rosa (1979). and Osterman & Coleman (1981) have 
reported that removal of DXA to the left of the Hpall site at position -21 
relative to the 411B start site, to the left of the Hpall site at position -24 
relative to the </>9 start site, or to the left of the Taql site at position -23 relative 
to the $13 start site, has little effect on promoter activity for purified T7 RXA 
polymerase. Apparently, promoter activity requires no sequence information, nor 
any DXA at all to the left of position -21 or so. However, removal of the DXA to 
the left of position -10, by cutting at the Hinfl site found in most promoters, 
abolishes promoter activity (Rosa. 1979: Oakley et al.. 1979: Osterman & 
Coleman. 1981). Osterman & Coleman (1981) found that replacement of the DXA 
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^^^^=^^^ 

the left of - with et anothr ^ ? ^ PBphced the ~*™* 10 

promoter. »h,ch has an uninterrupted string of ten A T base-rjair, Th. Il* 
S^tZT to * " "" ,, , iVC ' V " eak " ~" " ilSd 

position -13. P " A T ta »-I»"» extending leftward from 

G M S.TV' " Pn "°'" < "' "I"""™* < Tible 'I 1>« II but two have 

(ChaTborita 1 r k "° Wn to be « in * lm » s ' with GTP 

pva - v * j 7 e observ ^ incorporation of |y- 32 P14TP intr, 

named because it lies within thTJT e *<; e Pt>on » the promoter (so 

coding gaps Sat g f ° T & ne 4) ~ The smal1 ^ "on- 

(^Ta oft l?" Pr r° terS 25 base P ai - WW) *nd 27 base-pairs 

base-pain^ 'long " ^ ^ """i" pr ° m ° tere 

are at least 63 



(c) Transcription termination sites 
Transcription of the early region of T7 D\4 hv F ^ pva i 

ai. nucleotide 7588 and one-third at the following G (Dunn & 
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Stud.er. 1980). The nucleotides in the RNA preceding the terminal nucleotides can 
be arranged I ,„ a relatively stable stem-and-loop structure that eon^TLSI 
ZdTt™ an<J * f ° Ur " baSe ,0 ° P (Fi « 5 ^ a structure com m ?nly 

Court. 1979) Term.nat.on at this 8 ,te is rho-independent and is not completely 
effioent m vtvo or m vUro (Millette et al.. 1970: Studier. 1972: Kiefer e7S 
We refer to th.s transcription termination site as TE (terminato^^ 
ev,dence that transcription by E. cdi RNA polymerase can terminal sLcZSlv 
at two additional sites in the late region of T7 DNA fi~r 8 l»ca»cal £ 

(Minkley * P ribnow , 1973: Petere f HaywIL ,^4,,^ TIonTTt 
transcription termination site for T7 RVA nol vma J second at the 

(unshed results, Termmation at^ftes ~ beTeL ST t £ 

Transcription by T7 RNA polymerase does not terminate at TE, but proceeds 
st fof^ rTa^. "T 1 ST & StUdier ' 1980) " ° ne 

RN A nucleot.des ,n the non-coding gap between genes 10 and 11 can be arranged 
m a stem-and-loop structure that contains 14 to 15 uninterrupted basel!r3a 
six or eight-base loop (Fig. 5). This stem ends in a string of six^nScutive U 
rescues the only place in the entire T7 DNA molecule whe^ maTy t six 

i^w (po S ,t.on 60-62), the first nucleot.de past the string of U residues 

A A 

U A 

c c 

U G 
C-G 

G U 

U C 6< 

u e 6« 

C«G G U 

C-G US 

A-U "-A 

C-G C-€ 

UG C-G 

C-G C-G 

G< C-G 

G-C A-U 

Flc. 5. Transcription termination sites for E co/i" »n,l ttpvi i ,. 
structures ahead of the site of termination a^ ^'y™~>es. Potential ba^-paired 

chains in TE mainly at the underlined C but aCt l^„f • * Pp ,vmera8e «*™inates R.VA 
Studier. 1980). E coli R\A rolvmerai »i« " * ^ tK * t "" e at the "«* G ""Wue (Dunn & 

position is no't known (unpubTish^ ^u^T A poCe^ ^ " T * ^ «" 

TE ,Du„„ * Studier. but *r£a«^ 
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completely efficient tenninTtionat' his ?" ( ^ A,Hster * «*■ 1981,. In fact, 
promoter for T7 RX A pol^^ J^ h W ? uld * lethal *> T7: there is no 

«. which ^^jrrt^T^^^^ 77 ^ 

transcribed entirely by readthrough of Z ^ * ^ mUsl * 

A termination site just na i / j/j g ' '' 

strategy that ensures production ofll " W™*? ° f a transcriptional 
T7. specified by gene ^T^ ^T^ "VT T*" ^ d ^ * 
from three da,, HI promoters J. f" ° f U,e * ' aSS 11 P«»m»ters and 
amount, of gene 70 LkC^J XL ™" ^ '* Large 
mRXAs might well be deleteriou"7nH " ^, U,va,enl am " u "'* <>' downstream 
•nay provide a proper baW ' part ' a ' '~<ion site behind gene JO 

5. RNAase III Cleavage Sites 

a ,,OS, KXAase III. is . 

•973,1975). The precise llu To deal , ^"V*" I*™ & Studier. 
cleavage sites in the eJT^io, ' ,"7 ^ ^""^ for the fi - 

Rosenberg* Kramer. IW7- rE^TS ' o-- ^ Wnbe «* * «7.. 1974: 
UNA around eaoh of these ZT^l , ° W m P uhlish ^ «"lf). The 

--pairing within which liesTe ^ofK Ip* tTT? ^ ° f 
previous nomenclature for RN w m J ,ea ' a S e <*'6- 6). In a change from our 
ww refer to each site bv R follow^ bv tlT ^l^r ^ & Studier - 1981 )• « 
he cleavage site. In tt^t? ^v fatgW,0,h, * t4f 
re RM. R0 5 . R/, R/ . 7 and S * ^ RNA *« 111 dea ™ g e sites 

RNAase III cleavage sites in the T7 late R\ A S ha v.K^ i 
id.rect evidence suggested that sites should he aUeitTt bUt 
etweeo genes 6 and 7. between gene, 72 and /? lea 1 t . between &™ 3-5 and 4. 
Dunn * Studier. ,975.1980: Stud'er 1 7^ P^hll y^i?" " Md 
ie pa,ring potential of nucleotide ^JI U " g I978) Analysis of 

enes. and comparison ^TLT^ m *** ^ tb ' ' a * 

eavage sites ahead of genes 47 6 S anrf L g * identifies ,ikeh - 

tes might be expected i he within exl * . 6, - J ° ther dea -g- 
*• f. 73 and 77. and ne^i a T ^ Pa,md ^ ahea< ' ° f >"* s 

-es 76 and 77. (We l-ve^^^^ 1 *™ terminatio " between 
rmination site for £ „* RV A " , J'" 1 "- ,980) f ' hat the trwwcriptioi, 

-age site for RNA^e HI , ^ ZT ? * ° f ^ ^---t a 
omoter sequence for T7 RX A m.lvnT^U ^ ,tentia) dea ™ge sites, a 

*ded for RNAase III cleavage " ^ *** tho ^ h, ^ be 
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iri^Hi EaCh ° f the Plasmids oontained a promoter for T~ rva ■ 
oriented so as to direct transcrintion in *h P romoter for T7 R.\A polymerase 

from the BamHl site of^SS a H . (counter cl ^« ise, direction 
termination sites for T7 R\4 ™T * " *' hi ° h there are ™ strong 

1981). Except for P AR436^h^carrie^e tJ? ^ (MeAUi>ter - 
transcription of the plasmids bv T7RY a 7* tem,,nation sit * T7 DNA. 

P b) 77 R * NA P°'ymerase produces a heterogeneous 
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( R PlaSmjd DXA « ™ transcn^i £SK ^ -or.b.nant plasmids. T7 DXA or 

( J-Jfcwdwil m the presence of | a ^P ]VT P Jd l ^ZP (a gift from C. Fuller & 

a buffer containing <* „.>;h 4 C1 eJentMv « A ' ^ V w ith ' ,urified ***** II] in 

formaldehyde, and subjected to electro,^^ UiSih^ £ ""^^ buflfer containing H M . 
same solrent. folded by autoradiogCh^ ^ ^ ^ 2 V ' c ™ in the 

I>NA inserts into the Bam HI site of nBR322 in th T P^mtd* contained fragment, of T7 

KosenUetg. 1981 Tw of the fr^mLte SnL ^ < counter ^«*» -ienUti™ <Studier A 

were cWi along with the */0 ££££ mCe „™ "° f ° r ^ RXA Polymerase, and these 

R-M transcribed from T7 DXA itself the fir*Tv £ RVa , ^ ^ « n 'l'»* «ontain*d 

^J*«en* ^ Fig. 4. Tables 8 and 9, The p^mid °" "^T *~ given at the side, of 
-looed fragments, and the promoter* «„H .i 1 <l«tgnatio ns . the lo<^tions in TT I)\A of th^ 
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population of RXA* longer than the pla«mid DNA If an RVAasp ^ • 
P-nt in the RXA bribed from L inaerted f^?rf5T D ^3j;?- 
ane cut to a homogeneous length about the size of the plasmid DXA but ff t 
cleavage s.te .s present, no specific cleavage products are o Served -» I ♦! 

,h,n lh " RN ' A "™"' ced « - uT°™£ 

The results of Figure 7 ciearlv identifv three nrimarv RVa, . ttt i 
and /*■.,. In addition, a les* efficient cleavage site is oh^rv^ u u L 

» «; ,7 ".; iz z t ' n,he '^""^ ^ ^ »^» » 

». or // . or at the T<£ transcription termination site 

Even though no evidence for an RXAase III cleavage site ahead of™™ ? * 
found from transcribing plasmid DXAa. at least t^ot^X^Te'lZ 
point to a cleavage site in RXA near position 28 in T7 DVV m RVWm 
treatment re.ea.es a small RXA fragment (approximate,, ^ ^ eo t ^sTf " m 
RNA transcribed by B. coli RXA polymerase from mutant T7 DX^ thattck 
he transection termination site at the end of the early region (DM 4 t£ 

ne_ar pos.t.on 28. (>) Appropriate in vitro transcripts of full-length or fragmented 

ssrs^^-^- ^^^Lnst-^: ^-^^^r^S? 

SSL'S ** " — ' m 

E^minahon ..r,,,,,™,/., Mrmlara i(| ,„ non . c0<ii , 

»tn, r | Ures are al»> p„ Me „ ,„. « »■ "« »l«m.tive 

th« lr,„« ri|„s f„„„ r,.l|.t.n K .l, „ fra^,l„m DX^', " f ,u PP """ '" 

irom tne cloned 1/ UA.A and .some sequence in thp RVA n ^ j r 

;t ^.ttsrcr ,e «■ b « 

thr» elhuent Rx., w III rleav,,,. ,„„ (R4 . r . Re,, , od RM . 51 <nd 
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relatively inefficient cleavage sites (R3 X and R/?> Th * ^ ■ , 

S ui ™, /. noo and K7^. which are indicated in Figure 6. 

6. T7 Messenger RNAs 

*J*L ■ °u netfler the RNA ^gan at a promoter or at an RXAase III 
RNls^ cLTT T'^r inten al The ' WSlfc '° ns and — of all ofX T 

the RNAs expect in t elw R \Tjt "I ^ 8 *" d 9: Tab,e 8 Iiste 
used ft, ^ro^Table 9 the RV 4 RN A f . IH f*™^ (""Mtkm. frequentlv 
BHA-J HI Lvai 9 ^ ~ * ^ of 

nomentt^ ts^ ^ e RXA »™^.v- « have adopted a ,v Stem atic 
te^inationTi^^i^He^xt" ^ ^ 

and 9,. The RNAs that ^^Z^^^R ™» 8 

bv Golomb & Chamhprlin na-7> l ° j i, P ec ' e& « to M in the nomenclature used 

a *» - 2Ti5J iX'i^Stitt ,he Ta f b ? *" * **• 

RKAase III cle.v«ge. ' As < " :p< *' ,ed W te •~T>«ed by 

ira| R i"l'! l "J ,d! ' 5t * b ' e •* " m !»"»"««• 1969-1970: Mam 4 \Wskv 

Ch^berlin, 19, 4; 7 Si" 975 P 'h,', V """""' S ' 1969: G * , " b * 
«e FV 7) Late RNAr Mw . 4 1978; our »"P»>>l»l.ed 

•gree-en. oe.we*, the p^L/.td "l^VA .Tf' ' ""P"* ^ 



T.ABLK 8 

Predicted T7 RXA*. unprocessed by RXAaxe III 



Position of RXA b 



Length of RXA 



RXA 



Left 



Right Xucleotides T7 unite M r (Xa* salt) 



E. coli RXA polymerase 

Alt 498 7588 

A2t 626 7588 

A3t 75() 7588 

T7 RXA polymerase 

¥>Ll 405 24.209 

<f>ll\t 5848 24.209 

^//Bt 5923 24.209 

tilt 6409 24.209 

V ->t 7778 • 24.209 

<£/*6't 7895 24.209 

^2--5t 9107 24.209 

£?<*t II. ISO 24.209 

<M^t 12.671 24.209 

4*n 13.341 24.209 

^vt 13.915 24.209 

Ilia 46-5 18.544 24.209 

IV 4$ 21.864 24.209 

V 410 22.903 24.209 

#>£rt 405 39.936 

4Ii.\tt 5848 39.936 

0/ /Bit 5923 39.936 

<£/-?rt 6409 39.936 

*/--5rt 7778 39.936 

^Atfrt 7895 39.936 

^••5rt 9107 39.936 

4^'Srt 11.180 39.936 

^crt 12.671 39.936 

<W3rt 13.341 39.936 

<W7n 13.915 39.936 

#?-5rt 18.544 39.936 

49rt 21.864 39.936 

4Mrt 22.903 39.936 

II 27.273 39.936 

Hlb 417t 34.565 39.936 

VI <*0/? 39.228 39.936 



7091 
6963 
6839 

23.805 
18.362 
18.287 
17.801 
16.432 
16.315 
15.103 
13.030 
11.539 
10.869 
10.295 

5666 
2346 
1307 

39.532 
34.089 
34.014 
33.528 
32.159 
32.042 
30.830 
28.757 
27.266 
26.596 
26.022 
21.393 
18.073 
17.034 

12.664 
5372 
709 



17-76 
17-44 
17 12 

59-61 
45-98 
45-79 
44 57 
41 15 
40-85 
37-82 
32-63 
28-89 
27-22 
25*78 

1419 

5-87 
3-27 

98-99 
85-36 
8517 
83-95 
80-53 
80-23 
77-20 
72-01 
68-27 
66^60 
65*16 
53*57 
45-25 
42-65 

31-71 
13 45 
1-78 



2.442.000 
2.397.000 
2.354.000 

8.199.000 
6.325.000 
6*99.000 
6.131.000 
5.660.000 
5.620.000 
5*03.000 
4.488.000 
3.973.000 
3.743.000 
3.546.000 

1.952.000 
808.400 
450.000 

13.610,000 
11.740.000 
11.710.000 
11.540.000 
11.070.000 
11.030.000 
10.620.000 
9.900.000 
9.385.000 
9.155.000 
8.958.000 
7.363.000 
6.220.000 
5.362.000 

4.360.000 
I.S48.000 
243.200 



• RXAs produced in the absence of RXAase III are referred to bv the promoter from which thev 
onginated: the suffix t identities RXAs that end at the first appropriate termination site (TE. T<£. or 
the right end of T7 DXA). after having passed through one or more RXAase III cleavage sites: the 
suffix rt identifies RXAs that were produced by transcription through T6 and that end at the right 
end of the DXA ; the promoter designation itself, without a suffix, refers to RXAs that end at T<* or at 
the right end of T7 DXA without having passed through any RXAase III cleavage sites (see Fig. 4) 
The RXAs that correspond to species II to VI described by Golomb & Chamberlin (1974) are 
indicated. 



designation itself (without any suffix) refers to an RXA that has its right end at the first RXAase III 
cleavage site or termination site: the suffix r identifies RXAs produced bv readthnriigh of and 
which end at R/3: the suffix p refers to an RXA that escaped cleavage by RXAase III (presumably 
only at R3-8 or RJ3) and which therefore ends at the second RXAase III cleavage site from the left 
end of the RXA (see Fig. 4). The RXAs that correspond to species Ilia to VI described bv Golomb & 
Chamberlin (1974) are identified. The 4^-5 • 4B. 410 and 4>OR RXAs are unaffected bv RXAase III 
cleavage and are therefore identical to the RXAs of the same designation in Table 8. 

b The nucleotide numbers of the predicted first and last nucleotides in the RXA chain are given. 



Table 9 

Predicted T7 RXAs. after processing by RNAase III 



Position of RXA b 



Length of RXA 



RXA 



Left 



Righi 



E. roli RXA polymerase 



Al 


498 




A2 


626 


890 


A3 


750 


890 


" 03 


UOl 


1468 


0-7 


1469 


O 1 OO 


/ 


3139 


5887 


11 


5888 


6448 


IS 


6449 


7588 


lymerase 






4>OL 


405 


890 


<t>l 1A 


5K48 


5887 


\R11 1 


5888 


<U48 


U/*/kJ 


5923 


6448 


61-3 


64(»9 


6448 


HJ3 * 


6449 


11.203 




7778 


11.203 




7895 


11.203 




9] 07 


11.203 


HJ-3p * 


6449 


13.892 




7778 


13.892 


<f>]~€p 


7895 


13.892 


4>25p 


9107 


13.892 




11.180 


13.892 


IK3 8] 


11.204 


13.892 




12.671 


13.892 


H-3 


13.341 


13.892 




13.893 


18.562 




13.915 


18.562 



nil 

IV 

v 



4& * 
4>U) * 

<f>10r 

[^6*«5rp] 
LRfrorpJ 



[ 



VI 



R13J 



18.544 
18.563 
21.864 
22.903 

18.544 
18,563 
21.864 
22.903 

18.544 
18.563 
21.864 

22.903 

27.273 
27,281 
34 .565 

36.856 
39.228 



24.209 
24.209 
24.209 
24.209 

27.280 
27.280 
27.280 
27.280 

36.&55 
36.855 
36,855 
36.855 

36.855 
36.855 
36.855 

39.936 
39.936 



Nucleotides T7 units M t (Xa* salt) 



393 
265 
141 

578 
1670 
2749 

.561 
1140 

486 
40 



40 

4755 
3426 
3309 
2097 

*7444 
6115 

5998 
4786 

[2713] 
L 2689 J 
1222 
552 

[4670] 
I.4648J 

[5666] 
L5647J 

2346 

1307 



[8737] 
L8718J 
5417 
4378 

[18.312] 
U8.293J 
14.992 
13.953 
[9583] 
[9575J 
2291 

3081 
709 



0-98 
0-66 

0- 35 

1- 45 
4 18 

6*88 
1 40 

2- 85 

1 22 
0-10 



135.700 
91.400 
48.580 

198.800 
575.600 
945.800 
193.300 
392.600 



[5C1] [ 140] [193.300] 

1*26 J L 1-32 J L>8l.4O0j 



0- 10 

11-91 

8-58 
829 

5- 25 

18-64 
1531 
15-02 
1^98 

6- 79 
673 
306 

1- 38 

1-69 
1-64 



[6-79] [934.800] 
L6-73J L926.600J 



[ 11-69] 
U1-64J 

[14-19] 



5-87 
3-27 

21*88" 
21-83. 
13-56 
10-96 



[21*88] 
L21-83J 

i 

[45*85] 
L45-81J 



37-54 

34-94 

[24-00] m [3.301 .000] 
L23-98J L3-298.000J 
574 



7*71 
1-78 



I67.7(K( 
13.860 
93.300 
81.400 
13.850 

1.638.000 
1.180.000 
1.140.000 
723.200 

2.564.000 
2.107.000 
2.067.000 
1,650.000 

800 
600 
420.100 
189.900 

.608.000 
601 .000. 

1 .952.000 
945.000 
808.400 
450,000 

[3.006,000] 
L3.000,000j 
1.863.000 
1-505.000 

[6.304.000] 
L6.298.000j 
5.161 .000 
4.803.000 

3.301 .000 
3.298.000 
788.800 

1.059.000 
243.200 



[!: 

r 

Li 



Fairs of RXAs that differ only by whether their o ends originated al a promoter or at an RXAase 
*L*?ZfL u .'Vw 8ame 1 non - t ' odi "g • "tervl are bracketed. Asterisks identify late RXAs that 
appear to be .deniable .n gel patterns of in vivo and in vitro transcripts of intact T7 DNA 

fiSm uhiih a ^ , ' r ? dUU r d the ' ,rPaen " of RXAa * »» w referred to bv the promoter 

from »h,ch they ongmated or by the RXAase 11] cleavage site a. the left end of the RXA: the 
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the right end I of T7 DXA (see Fig. 4,. Perhaps these structure stabUize the T7 

7CT: n £ yt,c deg ^, ati T from the 3> end * and this - 

ass itipr why Rx - w 111 c — - - ^ * P~ 

Some T7 RXAs also have the potential for base-paired structures at their 5' 
^ends. but th.s does not seem to be the rule. RXA initiated by T7 R^^eZ 
at several different promoters has a po.ypyrimidine tract near the 5^end Z 
can pa.r w.th the polypunne tract that starts the RXA. In some cUs a rathe 
large structure could be formed at the o end of the RXA (see "g 1,2 
Perhaps these structure, where they occur, also contribute to the^itv of £ 
R : NAs : or perhaps thev function to direct ribosomes more efficiently to 1* for 
initiation of protein synthesis. * 

All but four of the predicted T7 mRXAs code for more than one protein 
Monocstromc mRNAs are predicted only for genes /. 1-3, 10 and i^Tut^pt 
for gene /, the message for each of these proteins is also found as part of onH 

SSfftTE 7£ mat *T S h ^ Ve n ,° effCCt ° n thC of ^-stream 

genes (Stud.er 1972: unpublished results). Thus, there appear to be few oolar 

effect, at the level of transcription or translation. The onlv polar ^cTsTL 

described (Saito & Richardson. 1981) is at the translationaf level tranSat^n of 

gene U appears to be needed to activate translation of gene /Twhth fS Zs^t 

immed.ately m the same mRXA. It seems likely that synthesis of ml ^7 

proteins is initiated independently. " 



7. Synthesis of T7 Proteins 
(a) Protein initiation sites 
The 50 close-packed genes of T7 actually specifv 51 independent protein 
initiation s.tes: as noted previously (Dunn * Studier. 1981). genf/sp^rto 

e^tT^ Pr °f inS - WhKh t>egin ^ initiati ° n »*» 189 n-leotideTTpart Z 

the sti^ir; Ti' °: site - The nuc,eotide * 

the start sites for each of these 51 proteins, as well as those around the start sites 

f^ToCrr 1 over,apping proteins discussed m the next ^ion. S a™ 

n!!!V^ iat ' ,0n COd ,°r n J° r *" bUt Hve of the 51 P roteins specified bv the close- 

ES? 7 g T ■ Al G: th ! gCne J ' S - 6 3 7 7 17 5 and 19 begin at GUG 

Ahead of each initiation codon in the mRXA is a ribosome-bindrngLuence of 
from four to rune nucleotides, capable of uninterrupted pairing with nucleotides 
near the 3 end of 16 S ribosomal RXA (Shine & Dalgamo 1974 Steitz m£ Ml 

in the mKM: the exceptions are genes 0 6 (A-A-G-G-G-CJ 0-7 (\ \ P_r- a\ 
and 1-8 i^ Af G-A, The distance frem the i (or its eouifaiem in ^ oS2 
G-G nbosome-bindmg sequence to the first nucleotide of the initiation codon 
ranges between seven and 13 nucleotides. The shortest interval between th^Z 
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Table 10 

Initiation sites for synthesis of T7 early proteins 
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Pairing' 


Distance b 


protein 


length 


A to ATC 


03 


5 


12 




^ 7 


13 


o*:>* 


* 6 


10 




6 


12 


()■? 


5 


10 


1 


4 


9 


11 


4 


12 


12 


(i 


12 


1-3 


i> 


II 


Average 


5-3 ± Hi 


n-2±i :* 



Potential pairing to IBS rRXA e 
AlTftTftVUT — -5' 



(■TAATAA(TU(A<iiA££ilAA(A( , AA(;AJXiGCTAT(.'T 
C(iA(i<.'A(:TAC (;AO(;A(;(; ATf;AAOAflTA atCi' Tctact 

TTTA(TTAT(;AOO<iAGTAATC:TATA3X;("TTArTAT(- 
(MiAATCATCAAAOC^CAfTACXMAAATiiATX.XACJC 1 
AA('(:AArATAAA(i^<A(AATCaAATCAA(ATTA(T 
TTA('TAACT(;(JAA(iA£iJiCA("TAAAJj(iAA(' A('(J ATTA 
A(:AATTA(TAA(;A<iA(iCiA(TTTAA(;TAT(:r-(;TAArT 
« AA(:< (.TA(:( T(j(!(iA(itj(:T( A(;TAA(:AT<:.o<:A(-(;TT 
ATTTA A< ( A.AT .At.CAC ATAA A( ATTAT(;.AT(; A AC AT 



The number gives the length of potential continuous base-firing l K .t«ee„ the sequent .riven and 
the sequent a, the 3' end of !H S rib™! UNA. including (! y W-pain, ^ * 

16 S ?R\T ™ i U :* id ? tmm thf - ba «- "ould be exi ted to pair « ith the antral 1" of the 

c « ^"-"ce *« the first nucleotide of the initiation cod.m is given 

io-i, 8et|Uence rf " " u ^'t;d«. beginning at the 3 end of His ribosomal RXA (Shine 4 Daleamo 
,9 ' 4 » ,S a « thf ^ ° f thp Tabl < » ith <™tral r underlined. The sequences of Ihe ind vidua! 
£nrC„?.h g,Ve " DXA ^ W "' V (Trathpr thanl "' thef Ire mtn? to 

Z^th SsTrvT^H ^."^T^ 1 XAS Tht ' w " UPnrw - ^a, bases that could 

Kble Jhe b^ T " bC,0W tKf ^ thP - V WU ' d te ""'^ " ith - " 8h0 ^ *« the top of the 

Slons ^ uninterrupted pairing are underlined, a* are the potential initiation 

the ^"Thownt 8 r T Se f ° r " ^ " B^Sfi-A^.^. 2 oodons ahead of 

The sequence hyphens have been omitted for clarity from this and other Tables. 

paired nucleotide of the ribosome-binding sequence and the first nucleotide of the 
initiation codon is three nucleotides, the longest is ten. 

The co-ordinate expression of the three classes of T7 proteins correlates well 
w,Ui the P~gram of transcription during T7 infection, but. given the apparent 
stability of T7 mRNAs and the relative abruptness of the shutoff of class I and 
host pretein synthesis (see Fig. 3), some type of regulation at the translation^ 
level seems likely. Strome & Young (1978,19800.6) have presented evidence that 
T7 late mRNAs outcompete the gene 0-3 mRXA for translation both in vivo and 
tnvuw, even though the 03 protein is the most actively synthesized earlv protein 
This suggest* that, among T7 mRNAs, there may be a hierarchy of ability to 
compete for translation, and that this may be an important factor in regulation of 
protein synthesis during infection. Furthermore, the rate of synthesis of 
indiv.duaJ proteins within a co-ordinately expressed group also appears to van- 
more widely than might be expected from relative levels of mRXA. suggesting 
that some of the mRNAs are translated much more efficiently than others 
Among the class I proteins, those that appear to be made most efficiently are the 
gene 0-3 1-3 and perhaps gene / proteins: among class II proteins, the gene 25, 5-5 
and perhaps gene 35 and 5 proteins: and among class III proteins, the gene 8 9 
JO. 17 and perhaps gene 15 and 16 proteins (see Fig. 3). 
The nucleotide sequences around the protein initiation sites have been 
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TABLK 11 

Initiation sites for synthesis of T7 class 11 proteins 

Potential pairing to 161 S rRXA 

ALXcrL-ccArf--.:-o 



T7 


Pairing 


Distance 




•engcn 


A to A I (.* 


14 


7 




1-6 


o 


1 1 
1 I 




« 


Q 


17 


q 


1 A 


IS 


o 




2 


o 


12 


i -7 


6 


1 1 




4 


10 


.3 






36 


7 


12 


■3^ 


6 


10 


^4 


6 


10 


(*/) 


5 


7 




7 


10 


(42) 


4 


8 


4-3 


6 


9 


4o 


5 


10 


4-7 


5 


10 


■5 


5 


10 


S3 


6 


9 


5-5 


5 


10 


-5-7 


8 


10 


6 


6 


12 


63 


6 


10 


Average* 


6-2+12 


UK>±12 



CTGCOTTT A TAAGCAG A C A ( TTTATGTTT 
CG A ( 'TO A CT A AAGGAGGT A C A 0 A C ( * ATG A TGT -\ CTT 
CGACTr\ArTAAA(iGAGAC\A(TATATr;TTTrGACTT( T 
(\AAA(:GAATCAAGGAGGTGTT( TG ATGGG ACTGTT a 
TG AT A A A C A TAAGGAT A A ATG TT A TG C ATA ATTTC A 
CTTTGGAAATCGAQAGGT CAATC ACT ATGT( A A A PC 
A(:GAAArCTAAAGGAGATTAArATTATGGrTAAGAA 
Gf 'AGACGAAGA('GGAGACTT( TAAGTGGA A( TGCCG 
ATATACGCAA AGGGAGG PC .XCXTfifii- AGGTT U*GG( * 
A AGG A A AGG A AAGCTAGG A A AG A A AT A A TGG CTOGTG 
TTTGTT('GrATTGGA^GT(\AAATAATGrGCAAGT(T 
TTTQTQGQQCTAG£A£GGAATTGCATGGAC A ATTf'G 
GTGACAACTGTGGGAGTAGTGATGGGAAOTCGOTGT 
CCG AA A C C CTC AGGAGGT A A A C OA ATG A CTT A C A AC 
A A C 0 0 AG AO AAAG£T A A AG 0 A 0 ATG AGG A AGGTOGO 
CGACTCArTAAAGGAGACACArrATGTTCAAArTGA 
AGT A ATC A A A 0 A££A<£ A A A C C ATT ATG T CT A A CGT \ 
CG A GTC ACT AT AG£AiS.AT ATTA 0 0 ATG CGTG AO C CT 
CGATAATCAATAGGAGAAATCAATATGATCGTTTCT 
G C C A CTG AT A C AGGAGG CT A ( 'T C ATG A A CG A A AG A C 
CAT A A AA CTAT AGGAG A A ATT ATT ATG G CT ATG AC A 
TGCAAC AGTAOGGGAGGTGTTCTG ATGTCTG ACT AC 
CTGAAACGAATGGGAGGATGTGTCTAATGTCTCGTG 
CTTT ATTG A C AAGGAfiATTT A C 0TGTG(5AG A C CGT A 



See the footnotes to Table 10. 

' Averages do not include potential overlapping genes (4-1 and 4-2). 

examined for regularities that might indicate a role in initiation for some site in 
addition to the nbosome -binding sequence and the initiation codon. or something 
that might correlate with the expression class of an mRXA or with the relative 
efficiency of translation within a class. There is a slight trend toward longer 
nbosome-binding sequences in going from class I (average length 53 ± 10) to class 
II (6-2±l-2) to class III (70±U). but there are individual exceptions to the 
trend. There is also a slight decrease in the distance between the riboso me -binding 
sequence and the initiation codon in going from class I to classes II and III 
(11 2±1 3: 1<M)±1'2:9-9±1*8). again with individual exceptions. Thus, the length 
of the pairing sequence and the distance to the initiation codon do not in 
themselves provide sufficient information to assign individual mRXAs to one 
class or another. 

One fairly consistent finding is that the nucleotides between the last paired 
nucleotide of the ribosome-binding sequence and the first nucleotide of the 
initiation codon are relatively rich in A and T but deficient fn G This is 
particularly true of the class II and III mRXAs. but again, there are individual 
exceptions. For all 51 sites. 72% of the nucleotides in these positions are A or T 
and only 9% are G; 27 sites have no G, 17 have one G, four have two G residues 
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Table 12 

Initiation sUes/or synthesis of T7 class 111 proteins 



T7 


Pairing 


Distance 


protein 


length 


A to ATG 


60 
67 


A 

4 

O 

O 


13 
12 




b 


12 


73 


Q 


1 1 


7 '7 




8 


S 


j 
1 


1 1 
1 1 


9 


7 


] | 


10 


(> 


10 


n 


7 


9 


12 


S 


8 


13 


6 


11 


14 


7 


11 


15 


8 


li> 


16 


8 


9 


17 


9 


* 


17o 


8 


9 


IK 


7 


10 


lfi-o 


6 


* 


US?) 


7 


9 


19 


8 


7 


(192) 


4 


9 


U9-3) 


6 


11 


19-5 


7 


9 


Average* 


70+J-4 


9*9 + 1-8 



Potential pairing to 16 S rRN'A 

ArwrcoAor o 



I ^ JTJ^T'TAAfiAfifiAATCTTT ATTATG TTA A< • A 
AAGAf - AT < ! ATGGGG x<:a XTTn 1 f ' A r " r " TTTCTTTf T 
AAGT((-G<ATTTQGAGGTAAGAA(;TGATC"TrTG ACT 
T^^^ ( ' TTTAAC(;Af;<;TATAA(;TTA T<^t;TAAG A A 

^'■T»; ^ A ™GAGA ( ACATTTAATGGC TGAGAA 
xTT ^ ^I^fiMAf AATAATAATOGCTC.A ATC 

TTTTT(.( T<iAAAiiGMiGAA<"TATATG('G('T('ATA('G 

AA T A l^ AfTA< ' ^iS£iA ^ 0 '^^ ^TTA J^-ATGA(■TAT 
it 4 IV A< ' 0AMiiSAi ^ ATAA( ' < ' ATA TC;TGTTGGGf • 
AAGA< ^ ; < ; ( ; A <»iTAATGAGCTATGAGTAA A A 
AAG £L TArA 2^*^ 

(MTTTACT TTAAGG AGGT <- A A atqT^t, 1 ,- rT 
ATC,GA(T(-TrAAgGAG(;TA(-AAGGTG(TAT(-ATTAG 
ty^a 1 ipw^^^^^ A A T AJ]G G A A A A ( JG A T 

TA A a^^T ^^ A ^^' A ^ A -^' '^ AGAA ^^'"-^ 
A A ™ (A ^fiAiiG_TA ( 'AC A ATG AGTACG TT A A 

^T^ AGGAACTTG ^2G ATAA f'^GTGGCTArACAA( > 
V^ A TS GAAGCTSG ^ G -6r rTrr «TGATGGCTACTrr 
CAAC ATAAAGOSA£SAGACTCA2eTT0CGCTT AT 



Sec the footnote to Table 10. 

• Averages do not include potential overlapping genes (IS -7. 19 2 and 193). 



and three have three G residues. This trend is even more pronounced in the 13 

^ve^stLlTr ^ 8eqUenCC a,8 ° tend to fftVOr A and T " - 
^ome bSdinf ° n ^T 8 ! 88 8trong, ' V " the "^leotides between the 

b^vZ nSid qUCnCe initiati ° n COd ° n - No consistent 

tuition,. ,e ° t,d ^. 8e ^ en L Ce ahead of the ribosome-binding sequence and 
translationaJ propert,es of these mRNAs has vet been identified 

the™T t °[ COd0n ™ fiFSt P ° Sition P^ 1 the initia *°" codon produced 

fiSe Sl n T ^ n 5 f0Ur d,fferent cod °<* are used as the second codon 
tSree timeTa^ I Codons / f re ™* °nce, eight are used twice, two are used 
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(Fig. 3). The only other alanine codon used, GCA. is used twi<*> ». th* 

low efficiency of translation. Whv the use of GCV^Z 5 ^ ^ ° r 
correlate with hi„h i», i r M the second codon should 

correlate w.th high levels of prote.n synthesis is not known but it will ho 

(b) Other potential T7 proteins 

more of the 50 close-packed genes in a different reading frame All but thL ^Z 
51 protem initiation sites have the sequence G-G-A-G or G A G G 5£ Z 

length of the ribosome-binding sequence is shorter (4^^ J? alt t"hT 
average distance between the ribosome-binding sequence a^Hh! 1 • > ou g\ the 

h^G ut uG h rtr in *v; ogroups: aimost ^^^sss^ 

the . nitiatio n codons of the 73 ^J^^^Z SEToSl 

uJ^- ty " f ° Ur ° f ^ 73 additi0 " al initiati °" Sites wou,d st ** » Protein within 
the coding sequence and in the same reading frame as one of the £m , , 

genes, and would produce proteins ranging in size f™ JTto^J? elo ^P^ 

other site would start a protein aheaJ of and ^ll^ZZVZ 

gene 4 o protein, and would continue to the termination site for th Lne To 

P^tein. It ,s not known whether any of these 25 potential ,n-frame proS are 

The remaining 48 of the 73 additional initiation sites would initial „ , • 
that would be read from the same sequence as one ofthe ^ 

the? a d 'T erent ^ THirt - V ° f the P™*™ that could't maSX" 
these initiation sites would be less than ten amino acids Ion* JZ „ i ? 
would contain 40 or more amino acids. Four of e£fl22iJ? ! '* 
rather unlikely to be made because: („ the j£££££Z ~! 
there . on , v a f oul , nucleotide Hbosome-binding sequence: and (3) the nuclei 
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between the ribosome-binding sequence and the GTG are rich in G and C\ This 
leaves four candidates for additional T7 proteins, the potential gene 41. 18 7. 19 2 
and 193 proteins, which would range in size from 39 to 84 amino acids. The 
positions of the coding sequences for these four potential proteins are given in 
Tables 2 and 3, and the nucleotide sequences around their initiation sites are 
given in Tables 11 and 12. 

Another way we have searched for potential T7 proteins in the nucleotide 
sequence is t6 look for long open reading frames that do not correspond to one of 
the close-packed T7 proteins, and to see whether they are headed by any sequence 
resembling a known protein initiation site. In the entire / strand there are 20 open 
reading frames 200 nucleotides or longer that do not corresjiond to one of the 
close-packed T7 genes. Four of them contain the coding sequences for the four 
potential overlapping proteins just discussed, and 15 have no likely initiation sites 
within them : the remaining open reading frame contains a plausible initiation site 
that would produce a protein of 112 amino acids, the potential gene 4 '2 protein 
(Tables 2 and 11). 

The five potential overlapping proteins identified in these two searches are of 
reasonable size relative to the known T7 proteins. Furthermore, the initiation 
sites for protein synthesis are generally similar to the initiation sites for the 51 
close-packed T7 proteins, including the composition of the bases that lie between 
the ribosome-binding sequence and the initiation codon. The coding sequence for 
the potential gene 41 protein lies within the coding sequence for the 4A protein, 
and the termination codon overlaps the initiation codon of the 4B protein. Such 
an overlap of termination and initiation codons occurs eight times among the 
close-packed T7 genes. The potential gene 4 2 protein would initiate within the 
coding sequence of gene 4. go past the end of the gene 4 protein, and end at the 
beginning of the <f>43 promoter sequence. Termination of a coding sequence at the 
start of a promoter sequence occurs six times among the close-packed T7 genes. 
The coding sequence for the potential gene 18 7 protein would lie entirely within 
the coding sequence of gene 18 5. and those for the potential gene 19 2 and 19 3 
proteins entirely within the coding sequence of gene 19. The second codon for the 
potential gene 19 3 protein would be GCU. the second codon used for the highly 
expressed T7 proteins. It seems quite jx>ssible that one or more of these five 
potential proteins is made during T7 infection, and we have some preliminary 
genetic evidence that the potential gene 193 may in fact be expressed 
(unpublished results). 

(c) F mnn'shift iny din ing trandatiov 
(i) The gene 10A and JOB protein* 

Gene 10 specifies the major c.apsid protein of T7. However, gene 10 amber 
mutants lack not only the major capsid protein {10 A) but also a larger protein 
(10B) that is made in much smaller amounts and is also found in phage heads 
(Studies 1972 : see Figs 3 and 8). Both the 10 A and 10B proteins are made iw vitro 
from purified gene 10 mRXA (Fig. 8). Examination of the nucleotide sequence in 
the gene 10 region led us to the conclusion that the 10B protein may be made by a 
shift in reading frame during translation of the 10 A protein. 
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The protein is predicted to begin at the AUG at nucleotide 22.966 and end 

1 ~ \ tl?K ^ ^ r h8Ve COnfirmed this * d^rmining the amino acid 
sequence at both ends of the 10A protein purified from phage particles. There is 
no potential protein initiation site ahead of the 10 A initiation site that could 
produce the 10B protein (in the way the 4A and 4B protein a re produced), and in 
fact the reading frame is open ahead of the 10A initiation codon for onlv five 
ammo acids. Readthrough of the L'AA termination codon could add 25 amino 
acids to the 344 ammo acids of the 10 A protein, but a -1 f rame shift during 
translation could add ,3 amino acids, a number that seems more consistent with 

stiftT nr ^ r 10 ' WB Pr0tein8 U P° n * el electrophoresis. A 

shift from reading frame 1 the reading frame of the 10 A protein, to reading frame 

3 at any of the 33 ammo acids preceding the 10A termination codon wouW allow 
protem synthes.s to continue to the UAA at nucleotide 24.159 Such a 
framesh.fted protein would end just ahead of the stem-and-loop that can form at 
the transcr.pt.on termination signal for T7 UNA polymerase (Fig. 5). consistent 
with the close-packed arrangement of the T7 genes 

The open reading frame that could supply the additional amino acids to make 
the JOB protem does not have a likely protein initiation site within it The best 

^^LZT^ Site W ° L U ' d indude * G A G G (23 '" 7 > and a G ^ initiation 
^TJ ™^ ) [ I*™' thC seP*™ 1 ™ ^^een the ribosome-binding sequence 
and the GIG .s at the outer limit observed for T7 proteins, and the U of the GUG 
is preceded by five consecutive G residues and followed by two more. Initiation at 
thu ; site would produce a protein of 48 amino acids, but it seems rather unlikely 
that this potential protein would be initiated 

To demonstrate that the 10B protein does require a coding sequence past the 
^ termination codon at 24,001. fragments of T7 DNA were cloned in pBR322 
and tested for abdity to specify these two proteins during T7 infection. The cloned 
fragments that were tested all began ahead of the 4,10 promoter and ended at 
vanous po.nte w.thin the coding sequence for the 10 A or 10B protein, or past the 
T<£ transcription termination site. Infection was by a gene 10 amber mutant so 
that any OA or 10B proteins specified by the cloned fragments could be observed 
The results are shown in Figure 8. When the cloned fragment contained the 
complete coding sequence for the <f,10 mRXA (lane 5). both the 10A and 10B 
proteins were produced in normal amounts. When the cloned fragment ended 
within the coding sequence for 10 A (lane 6). neither the 10 A nor 10B protein was 

t \f ™ T n ° rmal P ° Siti0n - but an equivalent amount of a protein shorter 
han the 10 A protein was produced. When the cloned fragment ended past the 
term.nat.on codon for 10 A but before the predicted termination codon for 10B 
(lanes , and 8). the 10 B protein was missing but the 10 A protein was made in 
normal amounts, exactly as predicted if the 10B protein arises bv frameshifting 
The cloned fragment used in lane 8 ends beyond the first termina'tTon codon that 
is past the 10A termination codon in the same reading frame, ruling out the 
poss.b.l,ty that the 10B protein arises by readthrough to the next termination 
codon. 

Where does the shift from the 10 A to the 10 B reading frame occur * The 10 A 
protem .s predated to contain no try ptophan. The part of the WB protein past 
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. ^ 7 P™*™ ^ified by cloned fragment* of T7 DXA in nro or bv purified 

RNA it, rrfro. Proteins were labeled with [ 35 S]methionine and analyzed bv gel electrophoresis 
essentially as described in the legend to Fig. 3. Samples 1 to 8 were analyzed on a gel of 125% 
aery amide and samples 9 to 12 on a 10% to 20% gradient gel. both with a 5<> /0 stocking gel. Samples 1 
u, lit were labeled 16 to 20 min after infection at 30V. and samples 11 and 12 were labeled in a cell-free 
nrotein synthesis reaction for 40 min at 37°C. Infections were bv: lanes 1. wild-tvpe T7 2 LG3 a 
deletion mutant lacking genes 11 to I S , 3. am 17 in gene 9 : 4 to 8. am 13 in gene 10 : 9. a double 
mutant containing am 17 in gene 9 and aml3 in gene 10: 10. am 17 in gene 9. The Ti mutants were 
oesenbed by Studier (1969) and Studier el al. (1979). For samples 1 to 8. the host was E. coli HMS174 
^ISSSt* folding plasmids: lanes 1 to 4. P BR322: 5. pAR436: 6. pAR1003 : 7. P AR1338 8 
UARI061. For samples 9 and 10 the host was a mel* derivative of E. coli B834 (Studier. 1981) The 
D j?^£ s °° nt f m fragments of T7 DNA inserted, by means of suitable linkers, into the BamHl site of 
oBK322 in the silent orientation, and were constructed essentiallv as described bv Studier £ 
Rosenberg (1981). The cloned fragments all begin at the Cla} site' at nucleotide 22.856. ahead of the 
610 promoter and end at the following sites: pAR436 at the Prull site at nucleotide 24.272. thereby 
including all of 10A and 10B: pARl(K)3 at the Hae\\\ site at nucleotide 23.834. within WA pAR1338 
ai the HaeW site at nucleotide 24.015. within 10B: pAR!061 at the Haelll site at nucleotide 24.090 
within 10B. Since a gene 10 amber mutant was used to infect these strains, anv 10 A and 10B proteins 
ooserved must have been specified by the cloned fragments of T7 DNA. The positions in the gel 
uatterns of the 10 A and 10B proteins, and of the nearby gene 13 and 9 proteins, are indicated. Lanes 
11 and 12 contatn the | 5 SJmethionine-labeled producte of protein synthesis in a cell-free system 
nrepared essentially as described by Goldman (1982). from E. coli BL15 (RNAase I " re!) (Studier 
9/oc). The protein synthesis reaction mixture for lane II contained no added RNA and that for lane 
4,10 RXA that had been ^thesized by purified T7 RNA polvmerase (a gift from 
Fuller & C. Richardson), using pAR436 DNA as template. Rifampicin (20^g/ml) was preseni to 
inhibit endogenous transcription. Proteins from the in vitro reaction mixture were precipitated bv 5°^ 
.w/v) tnchloroaoetic acid and the pellets were washed with 90% (v/v) acetone before being dissolved in 
sample buffer for electrophoresis. 
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the /^ termination codon wo„ld also contain no tryptophan. If the shift in 
readmg frame occurred before nucleotide 23.976. the WB protein wouS conL™ 

tryptophan ,s found m the WB protein. T7 proteins were labeled with 
[ HJtryptophan dunng T7 infection and were analyzed bv gel electrophoresis 7not 
shown). Xo label was found in the gene -5-5. 9 or Z pro ^ 8 " e 
nucleotuJe sequence (Tab.es ,4 and 15). Even after extended L^te of tt 
autoHuorogram no label was found in the position expected for the^5 p^teTn 
md.catmg that the frameshift must occur between nucleotides 23.977 and^' 
Thus result also provides additional evidence that the WB protein does not aVue 
by neadthrough of the 10 A termination codon. since such a teadthro^h prote^ 
would contain two tryptophans. ""ruugn protein 

The /^ protein is made at only a small percentage of the rate of the 10 4 
protein dunng T7 infection. Perhaps the frameshifting" mechanism ! ut Htaj £ 
mamtam a re at.vely constant ratio of the two proteins that great* ^the 
h P TT tr 1 "^^ une <* uaI »*» °' the two proteins b toeor^™!^ 
phage heads. The clones that make normal 10 A but not 10B protein Z bHsTf^ 
m testing whether the WB protein may have a special role in the siuctu' o 
assembly of phage head, It might even be feasible, by deleting one nucTeotidl Z 

t^ZZ^ mUSt * — ' *- ^ -u,d 

(ii) The gene ;>•:> and oo-ou fusion proteins 

,JZ° 0t ! , Q e «,T 7 / r0te i nS u are b ° th eliminated bv an amber mutation in gene 5-5 
(Studier. 1981 ). As w.th the 10 A and WB proteins, the shorter protein is Sal 

su S if TI TT tha " '° nger Pr ° tein ' Md *S™< the "-.eotide 

^t^^ e TS^ r e ofke ra r hiftk,g d r g trans,JL " 

termination don would p^".^^.^ j£ 

am no'! T ^ ^ ^ ** P™*™ »' to contaTn 98 

I™ fr > ,rote,n ™ ™™ ^ids. and the -5-5-.5-7 fusion protein 168 

ammo a. uls. The o , protem has not been identified bv gel eleotronhoresis but 

mobnit t o, ^r r 1 *™ 7 f -*" i^^tf^^ 

gene , .,. It should be nubble to ,,,„ nrm that the predil . ted . 5 .. 5 _. 5 . r flwj 
oontams he , T amino acid sequence by showing that the fusion protl b 
ehnnnated by an amber mutation in .ene -5 T. What role, if anv. the -5-5^ fusion 
protem may have during T7 infe. tion is not known. ' 

(iii) The Co deletion 

The Co delet.on of T7 shortens the gene 0-3 protein, reduces the amount of gene / 
protem made, and causes a very small amount of a gene 0-3-1 fusion protein to 
be made (Stud.er. .9736). We have determined the exact location o Z <5 
deletion m the nucleotide sequence, which allows these effects to be explained 

l« t~ i „ aroSe u V *. Cr0SS " Ver between tW ° A-A-T-G-A-A sequences, the 
hrst located w,th,n the coding sequence for the gene 03 protein, at nucleotide 
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123K and the second at the beginning of the coding sequence for the «ne 7 

?^Z J™J t t ' \ to 8 terminati °" -don three codons past the 

STto ?f theTr^ ' V Pr0dUCing 8 Pr ° tein ° f 105 instead of "« amino acids. 
nrotJin TK r T°T 8eqUenCe 81 3,68 is the initiati °" codon for the gene / 
n7rS. K 1°^" C ' 5 de,eti ° n has rem <» ed < he ribosome-binding 1^ j 

«~ m I f 8 much weaker ribosome-binding sequence ThiTT, 

the crossover pa.nt.or between the crossover point and the TAA at 3J 8"' a total 

translation would proceed to thp tap too, j , protein. 
;a « u P'^reea 10 tne J AO at 1331 and produce a protein of 134 amino 

not be detected .n the usual protein patterns during infection, 
(iv) The potential gene 06A awrf 0-6B proton 

nroton* Thl k»T .^ tUCl,er i ; «8i) of coding sequences for the gene 0-6 and 0-65 
protons. The best ev.dence that either of these proteins is made ifthe observation 
that, dunng mfect.on bv certain deletion mutants of T7 (Dll <£ 2 T^3 a 
™iZ «ZZV< & l h&S a Si2 r nsistent * being a fust ptteStha 
potentially could have beS'thTSi I T observa tlon of a protein that 

for «v «L TT^MrtT 1 ' f """ ""^^ brnding se,uen« 

readmg frame 1 to 2 between nucleotides 1769 and H95 al ntenal o elhT 

deLons. Howevt ofTe T^TaT ST^K ^ ^ ^ ^ ^ 

20 wiruugn oi Uie i OA at 179o in the same reading frame 
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could produce a protein of 120 amino acids and also explain the sizes of th. f, 

Proteus. At present, it is not possible to be certain justwhat proteTs « s 

tha th T" °i ?" ^ ^ FigUreS and Tab,es of this'paSr it " S£2 

*atthe frameshift mechanism is used, and that the WJ aS/ p S 

(v) Frameshifling in general 

tranLatt, rfV^f 8 ' 1 " ^"f"' '^P'* 5 ° f f — lifting durmg 
translation of T/ mRNAs. one was .nduced or revealed bv a deletion but the 

infe/tionT ' T ° f n ° rmal 0r Actions during £ 

^So h""/ T' Pa "!° f 0Verla PP in S P r °*™ would be produced in aied 

foundt j f ^ Sma,,er Protein - ^ rames hifting has recenth bZ 

found to be involved m a somewhat different way in the expression r,f tL T 
gene of smal, RNA phages (Kaste.ein et «,. ,982) 'and ^ZZZ natu l" 

VJ^rV 1 d T nd ° n thC nUC,e ° tide ' n ^e region of the 

frameshift and also on the physiological state of the cells (jJLL a t 
Blumenthal. 1979; Atkins * a/.. 1979: F fx & ^t^^t^f^ 
It seem, au.te possible that Shifting may be used rather general^ control 

fralthiZthe P,H fT ** ^ '° " ^ ~ ! direCtion ' ""erea. the 
d n V i ,'° n mUtant and that P ro P 08ed for are in the + 1 

direction. No particularly interesting homologies are apparent between the 
nucleot.de sequences in the two regions where The + | frameshifts m^Lar 
However, a stnking homology is found between the nucleotide sequences " the 
two regKms where the -1 frameshifts could occur: the sequence V UUC AA? 
occurs ,n the same reading frame in the frumeshifting regionlr both the gent^ 
and 10 proteins. This homology suggests that the U UUC AAA sequence mSTt h^ 
involved ,n the frameshift, and the sequence itself suggests .^oST^^SL^ 
perhaps phenylalanine transfer RXA. which real both UUC Z* utx 

IrTxT, \ PS baCl L° ne ^ at tHis of Phenvlalan^e 

J k h ^. been P^ 8 ^ °ne possible explanation for two examples of iTvZ 

TheT// R g v'r a yeaSt mit ° chondrial P-tein (Fox & Weiss-Brummer "f£Z 
The ^ mRNA ,s easy to isolate (see Fig. 7). and both the 10A and 10E P r2 
are made from it by in vitro protein-synthesizing systems from E. J^XZ 
it may be possible to study the frameshifting reaction in vitro. * 

(d) Termination codonx 
If frameshifting occurs during translation of the gene 06 and 10 nmf^inc ,k 
^TheT^ IfT ^-^ ^ T7 ««■ termite t^r^ere : 

bv UgJ iTS^ anTrAp" COd ° n * UAA - WhiCh ' S USCd 30 tim - Wk£S 
oy LOA 16 times, and LAG, s.x times. There are eight instancy in which a 

the sequence I -A-A-L -G. three times as U-G-A-U-G. and once as AUG \ There 

the To CaSeS L n ° h 3 termination <**™ constitutes the first three nucleotidesTf 
the conserved sequence of a promoter for T7 RNA polymerase ThT« 
nucleotide past the termination codon is usually U (33 tim£) but mlvMona) 
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Table 16 



Predicted amino acid sequences of T7 early proteins 
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flttSWITYfwv FQUlYEnWE NIRTOOiROT OCXKWlrrW «*flVPmYfl ni^i^ocr,. .n. ' " — 

IKYLEEVK rtEDEE RONRVPHtYfl OIFSVMRSEC I0LEFEOSCL «TJT«JV|R| lVR , m r IOL*Ofl£DL 

0-4 STTNVOYCLT AQTVLFYSOfl VRCCFMHSLfi HROlkElYEN WRIRLESAE 

O S «^MCLLr flLCLRVGflSF GKRtGVfiVGS TFTfCIIICl IKCflLRK 

rtlKHYVMPtH TSNGATVCTP OGFflrtKQRlE RLKRELRINR KiNKICSCYO R TH 

miTOIfWfil OfllKfiLPICE LDKRQCMUO UVEHVNSET CDCELTELNO RLEhQTJHMTT LKH Wnonr HM 

SOPflYTflfCR WrOGRPCIPN vt 0 VORHRGC tTVVLDRLRD ^wlnoHY ^«Tn^I! ^ aifWCF ^OOTSft «Y<mj_PW? VIKVGFKKEO 

.nrsNCOVPr hopvsfsok *og£f^ eel ke^ ESSS.' JS** 1 - Cklirkffeg IftS F0r*sc* 
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200 
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200 
400 
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CLSKPISUf IRPSCSLTII4 IC £r»™« ° ^^, r^^ Pta " W0 *' «""*™<. «SL«VSflLSS 

NTGEISEKVtt U1T.MG* LBr0VWSVt ™«5 gSEE J""?"" 

SflRKLUWEV KOWirCEILR KRCBVHHVIP OOFPVWETK KPIDTRUtn FLO***! ^* * !*S*SV!W RRVCWtM.K 

" c,e ^' «™«» t^^co ikes: jese ^r 00 ^ LR " rvv ^ 

WWEKHTK* SMMWOFER TKGRKLNKTK RORSmRS* CO 

GW.TS0HUW FMHMVa LOUWIYCOW rOflrtRKCC, RLRIEWSGN UOTSTFtH, OEOVLFWO ^ 

= SS S5S5i SEK SZE5 SffiK =5» "™"- — 

nvaaoL»Eo KRSEoea.1 vkopicitw gkkschJ™ pekrogiio o«££^ 251^ " oehvkwl loetffeieh obpestevto 

Sa*FFSP, G .COOCTIN ROMUIS XESf «*VICFEV LLESGRL-* , N[aBO oe Fre.vKERTL 



(e) Amino add sequences and compositions of T7 proteins 
The predicted amino acid sequences and compositions of the T7 proteins are 

EE 4™ I 3 to i* ^ ^ CakU,ated m ° ,eCular **** ™ 

Table 4. Enzymat.c and structural functions are known for many of these 

protems (Table 4). The amino acid sequences should be usefuJ for determining Z 
structure and b.ochemical interactions of the T7 proteins, and comparisons of the 
ammo acd sequences of T7 proteins and unrelated proteins thafhave similar 
functions might also be informative. 

The nucleotide sequence predicts that some large T7 proteins lack certain 
ammo ac.ds Reeve (personal communication) has verified that the gene 9, 15 and 
16 protems lack cysteine, and we have verified that the gene 3-5, 9 and 10 proteins 
lack tryptophan. i „ - K 

Knowing the locations of the coding sequences for the TJ proteins makes it 
poss.ble to analyze the frequency of codon usage. Analysis oif specifically labeled 
protems by gel electrophoresis, as in Figure 3. can provide estimates of the 
relat,ve rates of synthesis of the different T7 proteins, so correlations between 
evel I of expression and codon usage can be looked for. Superficial examination of 
^ frequences of codon usage has not revealed any dramatic correlations, but 
perhaps a more sophisticated analysis will be more revealing 
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. Table 17 

Predicted amino acid sequences of T7 class II proteins 



I .4 



•TtfKVCKFLP ^fWIUtm 1LRVYP0V*. VWCflCTLflR VCRCVHSIVN * 
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Table 18 

Predicted amino acid sequences of T7 class III proteins 
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8. Origins of DNA Replication 
The primary origin of replication of T7 DNA. that is, the preferred origin for 
the first replication of parental DNA, has been mapped by Tamanoi et al (1980) 
and Saito el al. (1980) to the non-coding region between genes 1 and 11 A 
secondary origin, utilized as the first origin when the primary origin is deleted, is 
probably associated with the +OL promoter for 17 RNA polymerase, near the left 
end of the DNA (Tamanoi el al., 1980; Dunn AStudier, 1981). We have confirmed 
the location of the primary origin and identified other potential secondary ^origins 
by analyzing the ability of cloned fragments of T7 DNA (Studier & Rosenberg, 
1981) to serve as origins of replication in a plasmid during T7 infection 
(unpublished results). Fragment* representing all of the T7 DNA except the 
region around the jOL promoter have been tested: relatively strong origin 
activity is associated with the primary origin and with the +OR and the 4>13 
promoters; much weaker origin activity is associated with some other promoters 
All of the origins of T7 DNA replication identified in vivo contain a promoter 
for T7 RNA polymerase, and T7 RNA polymerase is needed for proper initiation 
IL^^ ger * Seiffert. 1»75; Fischer & Hinkle, 1980; Romano et al., 

1981). The specificity of T7 RNA polymerase for its own promoters is an impor- 
tant part of the mechanism for switching transcription from host DNA to T7 
DNA during infection: T7 RNA polymerase is induced, the tost RNA polymerase 
is inactivated, and thereafter all transcription in the cell is directed to T7 DNA. 
Apparently, the same strategy is used to switch replication from host DNA to T7 
DNA: the T7 replication complex is induced, the host replication apparatus is 
inactivated, and the T7 replication complex is designed so as to require an 
interaction of T7 RNA polymerase with one of its own promoters in order to 
initiate replication. This arrangement uses the specificity of 17 RNA polymerase 
to direct all replication in the cell to T7 DNA. However, not all promoters for T7 
RNA polymerase are parts of replication origins. Knowledge of the nucleotide 
sequence, together with the cloning assay for origin activity, may make it possible 
to determine precisely what is required to define an origin of replication for T7 
DNA. 
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9. The Ends of T7 DNA, and the Concateraer Junction 

The longest stretches of T7 DNA that do not code for any proteins are at the 
ends of the molecule, positions 0 to 2-3. 980 to 98 6 and 99-0 to 100. The first and 
last 0-4% of the molecule contain a perfect direct repeat of 160 base-pairs. T7 
DXA is replicated as concatemers. that is. long molecules containing tandemly 
repeated T7 genomes (Kelly & Thomas, 1969). At the junction between adjacent 
genomes in a concatemer. the non-coding regions that will ultimately be at the 
ends of the mature DXA flank a single copy of the terminal repetition {Langman 
et al.. 1978; our unpublished work): the terminal repetition of mature DXA is 
generated during maturation and packaging. In Figure 9. the features of the 
region between genes 19 and 03 are drawn to scale as they would be found in a 
concatemer junction. 

As pointed out (Dunn & Studier. 1981), the non-coding region at the left end of 
mature T7 DXA contains several prominent features: from left to right, these 
include the terminal repetitions regular array of 12 short repeated sequences: an 
A + T-rich region that contains the <f>0L replication origin: the Al, A2 and A3 
promoters for E. coli RXA polymerase: the R0-3 RNAase in cleavage site: and 
finally, the start of the coding sequence of gene 0 3 (see Fig. 9). Some rather 
similar features are found near the right end of mature T7 DNA : from right to 
left, prominent features include the terminal repetition; an array of 12 short 
repeated sequences similar to that found near the left end : the coding sequence of 
gene 19 o: an A+T-rich region that contains the <j>OR replication origin; and 
finally, the end of the coding sequence of gene 19 (see Fig. 9). To facilitate 
discussion, the region of DXA occupied by the array of 12 repeated sequences 
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Position in mature T7 DNA (T7 units) 
Fit;. 9. Concatemer junction of T7 DXA. The arrangement of sequence elements between the coding 
sequences of genes 19 and Q J in a concatemer junction is shown. The locations of the, single copv of the 
terminal repetition (TR) and of the arrays of short repeated sequences (SRR and !?RL) near the right 
and left ends of mature T7 DXA are indicated. The leftward minor promoter for E coli RXA 
polymerase (AO) lies within SRL. The positions of the strong earlv promoters (Al. A2 and A3) and of 
the first RXAa.se III cleavage site (R0-3) are also indicated. The <K>R and 4OL promoters for T7 RXA 
polymerase, and their associated A+T-rich sequences, apparently serve as origins of replication in the 
T7 DNA. The arrows indicate the direction of transcription from the promoters. The scale is in map 
units. 
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adjacent to the terminal repetition at the left end of mature T7 DXA will be 
referred to a* SRL. (short repeate. left end), and the similar region at the right end 
of mature T7 DXA will be referred to as SRR (short repeats/right end). 

S>KL and SRR both contain 12 copies of the sequence C-C-T -WA-G or 
variants of ,t, arranged in two sete of six. The copies are less variable in sequence 
and are more regularly spaced in SRL than in SRR and. in fact, the arrav in SRR 
would Probably not have been identified without SRL as an example." In SRL 
e.ght of thejfr-copies are C-CT-A-A-A-G and four copies differ from this sequence 
in one position; in SRR. only two of the 12 copies are C-C-T-A-A-VG six differ 
from this sequence in one position, three differ in two positions, and one differs in 
three positions. The spacing of the repeated sequent in SRL (beginning at 
nucleotide 175) is a very regular 13. 13. 12. 13. 13 28 13 13 13 13 13 
rn.cleot.des: the spacing in SSR (beginning at nucleotide 39.605) is a similar but 
less regular Jo. 13. 11. 13. 13. 29. 12. 11. 11. 12. 12 nucleotides. Additional 
homologies between nucleotides adjacent to the basic repeated element* increase 
the length of some perfect repeate to as long as 23 base-pairs in SRL (Dunn & 
•Mudier. 1981) and as long as nine base-pairs in SRR. The longest perfect 
homolog.es between SRL and SRR are nine base-pairs. SRL occupies 164 base- 
pairs, from the first nucleotide of the first repeated element to the last nucleotide 
of the 12th. and SRR occupies 159 base-pairs. SRL begins 15 nuc leotides past the 
end of the terminal repetition and SRR ends 14 nucleotides before the terminal 
repetition. The leftward AO promoter for E. coli RXA polymerase lies within 
SRL: it occupies the interval between the two sets of six repeats, and overlaps the 
last two repeate in the first set ; RNA chains would begin within the first set of six 
repeate, at nucleotide 224. No equivalent promoter has been identified within 

The location of SRL and SRR relative to the terminal repetition suggests that 
these arrays > of short repeated sequences may have a role in forming the mature 
, Perha -V B the nearbv origins of replication associated with <f,OL 
and <40R are also involved. Maturation of the DXA is known to be associated with 
p^kagmg mto phage heads, and the gene /* and 19 proteins are also required 
(Studier, 1972). It is interesting that all of the element* identified across the 
concatemer junction, including the terminal repetition itself, have an intrinsic- 
polarity and, although elements such as SRR and SRL. or <f0R and WL are 
approximately symmetrically positioned relative to the terminal repetition the 
po anties have the same orientation relative to T7 DNA itself and are therefore 
not symmetrical about the terminal repetition. The molecular details of the 
maturation and packaging process are far from clear, but perhaps knowledge of 
the nucleotide sequence, and of the arrangement, of elements around the 
concatemer junction, will be helpful in working them out 



10. Other Features of the Nucleotide Sequence 
The nucleotide sequence given in Figure 1 for the / strand of T7 DXA contains 
10,841 A residues, 9767 T residues, 10.291 G residues and 9037 C residues which 
amounts to 27- 2 o /o A, 24- 5% T, 25 8% G and 220% C in the / strand. T "dou^e 
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stranded molecule contains 20.608 residues each of A and T, and 19.328 residues 
each of G and C. which amounts to 48*4% G + C. The molecular weight of the 
sodium salt of T7 DXA is calculated to be 26 43 x 10 6 . 

Uninterrupted runs of a single base occur much less frequently in T7 DXA than 
would be predicted for a random sequence of nucleotides. No such runs of seven or 
longer occur, and only one run of six. are found in the / strand, which is the run of 
six T residues found at the transcription termination site T<f>. In addition to this 
run of six. the / strand contains T-T-T-T-T once. C-C-C-C-C and G-G-G-G-G three 
times each, and A-A-A-A-A seven times. 

Uninterrupted runs of pairs of nucleotides are also under- represented in T7 
DXA. If we consider such runs of eight nucleotides or longer, only runs containing 
A or G have an approximately random distribution of lengths, and then only in 
the / strand: 126 such runs are found between eight and 14 long, plus one run of 
22 (which occurs within the coding sequence of gene 3), For other pairs of 
nucleotides, the number of runs eight or longer in the / strand is 80 of G, T: 59 of 
A, C: 35 of A. T: 32 of C. T: and only five of G. C: the longest uninterrupted runs 
are 14 of G. T: 13 of A. 0: 17 of A.T: ten of (\T:and eight of G. C. 

The locations of most of the runs of pairs of nucleotides is within the coding 
sequences for T7 proteins, as would he expected if these runs were distributed 
essentially at random. An exception is runs containing only A or T. where 11 of 
the 16 runs of ten or longer, ami 19 of the 35 runs of eight or longer, are found in 
non-eoding sequences. This distribution presumably reflects a physiological role 
for A + T-rich regions in opening the DXA at promoters and replication origins, 
and perhaps in destabilizing RXA structures ahead of initiation sites for protein 
synthesis. In addition, four of the runs containing only C or T are located just 
downstream of promoters for T7 RXA polymerase, where they can base-pair with 
the polypurine tract at the beginning of the RXA. and two such runs from parts 
of the paired structure at RXAase III cleavage sites. 

All perfectly repeated sequences of ten continuous base-pairs or longer in T7 
DXA have also been identified. The longest repeated sequence in the molecule is 
the terminal repetition. 160 base-pairs long. Because there are 17 promoters forT7 
RXA polymerase, which share a high degree of homology over 23 continuous 
base-pairs, large numbers of relatively long repeated sequences are found. A 
random sequence 39.936 nucleotides long would be predicted to have an average 
of less than one repeated sequence 15 base-pairs long or longer (Dunn & Studier. 
1981). but T7 DXA has 76 such pairs of repeated sequences. The longest perfect 
repeat involving promoters is 3u base-pairs, between the 4>K) and 4>13 promoters. 
Other than the terminal repetition and repeats between promoter sequences, the 
longest repeated sequences are one repeat of 23 and two of 20 within the SRL 
sequence near the left end of T7 DXA (see section 9). and repeats, of 17 
between the A2 and A3 promoters and between the R0 7 and RXAase III 

cleavage sites, all of which have been |>ointed out (Dunn & Studier. 1981). 
Two re|>eats of 16 and three of 15 base-pairs are found between sequences that 
are not parts of promoters, and these are repeats between coding sequences 
in different genes. As re|>eats become shorter still, a larger fraction is found 
between sequences that are not parts of promoters. However, no other class of 
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th^DNA^ 8611,161,068 ana, °g° us 10 promoter sequences has been identified in 
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Aote added in proof : We have now found that purified T7 RXA polymerase initiates 
RXA chains with ATP at both the tf)L and <f>25 promoters, as expected if RXA chains 
begin at position + 1 of the conserved promoter sequence (Table 7 of the text). However, a 
minor hut significant fraction of the chains were found to begin with GTP. presumably at 
position +2 of the promoter sequence. At <f>OL. perhaps 20% of the chains began with 
GTP. but at <f>2o the fraction was much smaller. In contrast, all of the RXA chains 
initiated at class III promoters have been found to begin with GTP at position + 1 (Rosa 
M. D. (1979). Cell. 16. 815-825: Oakley. J. L.. Strothkamp. R. E.. Sarris. A. H & Coleman 
J.E. (1979). Biochemistry. 18. 528-537). T7 RXA polymerase clearlv has a strong 
preference for initiating RXA chains at position + 1 of the conserved promoter sequence 
but changes from the class III promoter sequence apparently can allow some chain 
initiation at other positions. 



